A step by step example using the xmltr suite to translate a simple document from xml markup into HTML for publication on a web site.
For this example, we'll choose a very simple xml document which defines a single page containing a title and a paragraph with some emphasized text: <page name="tutorialwebpage"> <title>This is my home page</title> <section> <title>Introduction</title> <par>Page with <emph>little</emph> content.</par> </section> </page> (For clarity the <?xml version...> declaration at the start has been omitted.) This particular example is included in the demos table in the xmltr suite. We want to translate this xml source into an HTML page to be rendered via the Frontier Web Site Framework (WSF).
For this source file, well need to define translation rules for all the patterns of xml tags which occur. These patterns are: <page> <page><title> <page><section> <page><section><title> <page><section><par> <page><section><par><emph> We define these rules by creating a translation rule table within the Frontier Object Database (ODB). This table structure is a hierarchy. The top level table matches each first level tag in the patterns. The second level table matches each second level tag in the patterns, and so on. Here is an outline of the table structure we need to define translations for the above tag patterns: page page content title the title for a page section section content title the title of a section par a para within a section emph emphasized text in a para in a section The value of each of these ODB table entries defines the translation rule which should be applied to translate the content which appears between corresponding tags in the xml source file. When an entry in this table refers to a subtable, the value of the entry cannot be used for the translation rule definition (since the value is in use for a subtable). In this case a special entry (beginning with an underscore) is used in the subtable to denote the translation rule for the parent entry. Hence, an actual Frontier table structure for the above translation rules might be: page subtable _page script translation rule (for <page> tag) title script translation rule section subtable title wpText translation rule par subtable _par string translation rule emph string translation rule The entries beginning with underscore _ are the translation rules for the parent node i.e. the _page entry is the translation rule for the <page></page> tag pair. It is often the case that translation rules for such parent nodes don't need to perform any translation (usually when all the actual content appears within child nodes). In this case, you can omit the entry beginning with the underscore, and an identity (or one-to-one) translation rule will be used. This can be seen in the absence of a specific rule named _section for the <section> tag above. Notice that some of the translation rules are scripts while others are strings or wpTexts. What might these translation rules look like?
String and wpText translation rules work exactly the same way, however a wpText can make editing of long translation rules easier. In each case, the result of the translation is the text of the string or wpText in which the special token <children/> is replaced by the result of translating all the child nodes of the node being translated (in the xml parse table). In the above example, the translation rules for the tag pattern: <page><section><par> would be evaluated by first translating all the content of the <par> node in the xml parse tree and then applying the translation rule for <par> to that already translated content. Translating the content of the <par> node would involve translating any tag patterns for: <page><section><par><emph> So in practice these two translation rules might look like this: _par <p><children/></p> emph <b><children/></b> Were translating the xml par tag (but only when it occurs in the context <page><section>) into an HTML paragraph tag pair <p></p>. The xml emph tag (again in this context only) is translated into the HTML bold tag pair <b></b>. (See the following discussion on wildcards in translation rules for performing translations when context does not matter.) For the purposes of demonstration these translation examples are trivial. However, you can of course do more fancy kinds of translations. For example you could translate a <title> tag into a whole series of HTML instructions which not only format the title with the desired font, size and color, but also insert special spacing or a decorative graphic. Doing that kind of markup by hand in HTML would be tedious and would invite errors and inconsistency.
Script translation rules can be used to perform much more complex translations. When a script translation rule is invoked, the script is called and the return value (usually a string) is used as the result of the translation. Script translation rules typically do the following:
Script translation rules can perform arbitrary processing with the full power of Frontier and the ODB at their disposal. Each script is passed the address of the xml node being translated (i.e. the address in the xml parse tree) so that attribute values can be queried using blox.getAttribute() Creating a new WSF page via the page translation rule script Lets look at a script for the <page> translation rule in our above example. In this case we want to make a new web page entry in a web site table in the Frontier ODB each time we see a <page></page> tag pair. This will allow us to author multiple HTML pages in a single xml source file. Here is a basic script to do this: on _page (adrXmlNode, nodeData) local (adrWebpage, childString, pagename) # translate the content of the xml page node childString = xmltr.TranslateChildren(adrXmlNode, nodeData) # Get the "name" attribute to use as the page name in the ODB pagename = blox.getAttribute(adrXmlNode, "name") adrWebpage = @xmltr.demos.tutorial.[pagename] wp.newTextObject (childString, adrWebPage) return "" When this script is called, adrXmlNode is the address of a node in the xml parse tree corresponding to a <page></page> tag pair in the xml source document. We have used an attribute of the <page> tag (name) which tells us how to name the new web page object. The call to xmltr.TranslateChildren() translates any child nodes i.e. the full content of the page between the <page> and </page> xml tags. This is a recursive process, so translation rules will be applied to all the tag patterns in the page content and the result of all this is passed back by TranslateChildren(). The result is copied into a wpText object in the web site framework (consequently this script returns an empty string since its result has already been dealt with). The web page can then be rendered in the normal Frontier manner. In practice, such a script may do other things as well by examining and acting on other attributes of the <page> tag. The page title needs a directive Lets take the above example a little further by looking at the translation rule for the <page><title> tag pattern. Recall our xml sample file which began thus: <page name="tutorialwebpage"> <title>This is my home page</title> The text This is my home page is what we want to appear at the top of the page and also in the window title bar (i.e. within the <title> tag in the HTML header). A script translation rule for this <title> tag might look like this: on title(adrXmlNode, nodeData) local (content) content = xmltr.TranslateChildren(adrXmlNode, nodeData) return "#title \"" + content + "\"\r" + \ "<h2>" + content + "</h2>" Here were doing two things:
Remember that this translation rule will be triggered only for a <title> tag which is contained immediately within a <page> tag. The second <title> tag which occurs in our sample xml document is within a <section> tag and so will be handled by a different translation rule. We reuse the title tag but the result is different! Finally we will look at a translation rule for the second <title> tag the one which occurs in the context of a <section> tag. Referring to the table of translation rules shown earlier, we can see this was stored in a wpText. Its contents might look like this: <p><font face="arial,helvetica" color="#0000FF" size="+2"> <br><b><children/></b></font></p> In this case we choose to translate the title of a section to a particular font size and color with a <br> for some extra space before the title. Recall that the special token <children/> is replaced with the (translated) content of the xml node in this case the text which appears between the <title> and </title> tags. As you can see, the <title> tag is translated differently depending on its context in the xml document. This can make markup easier. In this case we need remember only that every component in the xml document page, section, subsection can have a title and that title will be translated correctly (and differently) for each context.
All that remains is to invoke our translation rules and view the result. Parsing the xml We use the blox parser to parse our xml source and generate a parse tree: blox.textToXml(xmlText, @adrXmlTbl) The first parameter is the xml source text (a string) and the second is the address in the ODB where we want blox to put the resulting parse tree. If no errors result from parsing the xml, we can then translate the parse tree using xmltr. Translating the parse tree The script which does the translation is named TranslateTree. You call it like this: result = xmltr.TranslateTree(adrXmlTbl, adrTransTbl) The first parameter is the address of an xml parse table produced from blox.textToXml() and the second parameter is the address of a translation table containing translation rules (as described earlier). You can also provide optional parameters to TranslateTree() for more precise control over the translation process. These are detailed in the script summary. Looking at the result By default, the translated text is returned as the value of the TranslateTree() script. However, this depends on the translation rules. In this tutorial example, the translation rule for the <page> tag actually intercepts the translated result for each page and puts it into a wpText object in the ODB. TranslateTree() in this case returns no result. For this example, the xml translated into HTML might look like this: #title "This is my home page" <h2>This is my home page</h2> <p><font face="arial,helvetica" color="#0000FF" size="+2"> <br><b>Introduction</b></font></p> <p>Page with <b>little</b> content.</p> Rendering the Web Pages If youre using xmltr to translate xml to HTML, then you can place the result of the translation into one or more wpText objects in a Frontier website table. From that point you can render the pages using the standard Frontier Web Site Framework machinery.
Wildcards in translation rules Sometimes a tag pattern occurs in many different contexts within a document, yet you want the same translation rule to apply for each context. An example might be an <emph> tag used for tagging emphasized text. We might want this always to translate to bold text wherever it occurs in the document. In this case we dont want to create repeated entries for <emph> in the translation rules table for every possible context in which <emph> might occur. This is where wildcard rules come in handy. You can place one or more wildcard rules at any level of the translation rule table and an attempt will be made to match a tag pattern with these whenever an exact match does not occur. The most deeply nested wildcard rule which partially matches the given tag pattern will be used. For example, if we place a wildcard rule for <emph> at the top level of our translation rule table, then this can match any tag pattern which ends with <emph>, provided there is not an exact match for that tag. You place wildcard rules (using the same format for normal translation rules) in a subtable named _any. This subtable effectively matches zero or more intermediate levels of tags. If this subtable is not at the top level of the translation rule table, then the wildcard part of the rule match will occur at that point in the tag pattern. Default translation rules You can specify a default translation rule which will be used for any tag pattern which matches neither a specific translation rule nor a wildcard rule. This default translation rule should be named _default. You can place default rules at any level of the translation rule table. The most deeply nested default rule which partially matches the given tag pattern will be used. To specify a default translation rule which will match any otherwise unmatched tag pattern, place it at the top level of the translation rule table. Where might this be useful? Lets say you wanted to use xmltr as a filter to extract certain components of the xml parse tree and discard others (maybe to build a table of contents). In this case you could create translation rules to match those tag patterns you wish to extract and use a default translation rule which translates to nothing to ignore all other tag patterns. Another use for a default rule is during development of translation rules to catch tag patterns which dont have translation rules and to translate these to a diagnostic message in your web page (rather than stopping the translation with a ScriptError). Re-using a translation rule via sameas Sometimes you may want the same translation rule to be used for a number of quite different tag patterns. For example, in a page containing name and address information, you might have tags for <name>, <position>, <company>, <street> and <city>, yet you might want an identical translation (e.g. the same HTML formatting) used for each. In this case you can provide the required translation rule for just one of the tags, say <name>, and specify that the other tags just use the same translation rule as <name>. In this example, the <position> tag would have a string or wpText translation rule which looks like this: position sameas:<page><name> This means that when a <position> tag is encountered (in the appropriate context), the translation rule for the tag pattern <page><name> will be used. Such a translation rule should begin with the characters sameas: and be followed by the tag pattern (complete with xml-tag-style angle brackets) whose translation rule should be used. When locating the translation rule for the specified tag pattern, the same pattern matching machinery is used as for normal translation rules. Hence the tag pattern appearing after sameas: can match a specific translation rule, a wildcard rule or a default rule. Script rules can return data structures for special processing The translation machinery of xmltr assumes that translation rules for each xml tag pair (or node in the xml parse tree) return strings which are then concatenated to form the complete translated result. However, within your own script translation rules, you can return something other than a string such as a list or a table structure provided you intercept this result and process it in the enclosing translation rule script. Two utility scripts in the xmltr suite operate this way: CollectChildren() (which translates all the child nodes separately and puts the results in a Frontier table) and CollectSimilarChildren() (which translates all child nodes of a particular type separately and puts the results in a Frontier list). Their operation is detailed in the script summary. Translation of entities Xml entities are like constants in other languages symbolic names for something. They are written with the notation: &entityname;. The blox parser makes special parse table entries for entities and xmltr provides a special mechanism for handling the translation (or expansion) of entities. By default, xmltr just passes entities straight through, so that they appear with the same entity notation in the translated result. If youre translating xml to HTML, this will probably be sufficient (since HTML shares the same entity notation with xml). You can provide your own translation mechanisms for entities:
The calling sequences for these scripts are provided in the script summary. Overriding the rule finding and processing behaviours You can optionally supply scripts which override the default behaviour of xmltr in two areas:
You need to specify these override scripts as optional parameters when you call TranslateTree(). Details are provided in the script summary. Preserving the rule cache for faster processing Xmltr builds a cache table which maps tag patterns to rules. This greatly speeds up the translation process for tag patterns which occur more than once in a document and most do. By default this cache is recreated at the start of each translation (each call to TranslateTree()) and deleted afterwards. You can preserve the rule cache between calls to TranslateTree() by providing your own rule cache table as an optional address parameter to TranslateTree(). Preserving the rule cache will speed up translations of lots of short documents. Details are provided in the script summary.
Website built using Frontier and xmltr. Documentation also available in pdf format for offline reading. Copyright The Design Group Qld 2000. This page last updated Thu, 23 Aug 2001 |