Xmltr Tutorial

A step by step example using the xmltr suite to translate a simple document from xml markup into HTML for publication on a web site.

A Very Simple xml Document

For this example, we'll choose a very simple xml document which defines a single page containing a title and a paragraph with some emphasized text:

<page name="tutorialwebpage">
   <title>This is my home page</title>
   <section>
      <title>Introduction</title>
      <par>Page with <emph>little</emph> content.</par>
   </section>
</page>

(For clarity the <?xml version...> declaration at the start has been omitted.)

This particular example is included in the “demos” table in the xmltr suite.

We want to translate this xml source into an HTML page to be rendered via the Frontier Web Site Framework (WSF).

Defining Translation Rules

For this source file, we’ll need to define translation rules for all the patterns of xml tags which occur. These patterns are:

<page>
<page><title>
<page><section>
<page><section><title>
<page><section><par>
<page><section><par><emph>

We define these rules by creating a translation rule table within the Frontier Object Database (ODB). This table structure is a hierarchy. The top level table matches each first level tag in the patterns. The second level table matches each second level tag in the patterns, and so on. Here is an outline of the table structure we need to define translations for the above tag patterns:

page              page content
   title          the title for a page
   section        section content
      title       the title of a section
      par         a para within a section
         emph     emphasized text in a para in a section

The value of each of these ODB table entries defines the translation rule which should be applied to translate the content which appears between corresponding tags in the xml source file. When an entry in this table refers to a subtable, the value of the entry cannot be used for the translation rule definition (since the “value” is in use for a subtable). In this case a special entry (beginning with an underscore) is used in the subtable to denote the translation rule for the parent entry. Hence, an actual Frontier table structure for the above translation rules might be:

page              subtable
   _page          script translation rule (for <page> tag)
   title          script translation rule
   section        subtable
      title       wpText translation rule
      par         subtable
         _par     string translation rule
         emph     string translation rule

The entries beginning with underscore “_” are the translation rules for the parent node — i.e. the _page entry is the translation rule for the <page></page> tag pair.

It is often the case that translation rules for such parent nodes don't need to perform any translation (usually when all the actual content appears within child nodes). In this case, you can omit the entry beginning with the underscore, and an “identity” (or one-to-one) translation rule will be used. This can be seen in the absence of a specific rule named “_section” for the <section> tag above.

Notice that some of the translation rules are scripts while others are strings or wpTexts.

What might these translation rules look like?

String and wpText Translation Rules

String and wpText translation rules work exactly the same way, however a wpText can make editing of long translation rules easier.

In each case, the result of the translation is the text of the string or wpText in which the special token “<children/>” is replaced by the result of translating all the child nodes of the node being translated (in the xml parse table).

In the above example, the translation rules for the tag pattern:

<page><section><par>

would be evaluated by first translating all the content of the <par> node in the xml parse tree and then applying the translation rule for <par> to that already translated content. Translating the content of the <par> node would involve translating any tag patterns for:

<page><section><par><emph>

So in practice these two translation rules might look like this:

_par        <p><children/></p>
emph        <b><children/></b>

We’re translating the xml “par” tag (but only when it occurs in the context <page><section>) into an HTML paragraph tag pair <p></p>.

The xml “emph” tag (again in this context only) is translated into the HTML bold tag pair <b></b>. (See the following discussion on wildcards in translation rules for performing translations when context does not matter.)

For the purposes of demonstration these translation examples are trivial. However, you can of course do more fancy kinds of translations. For example you could translate a <title> tag into a whole series of HTML instructions which not only format the title with the desired font, size and color, but also insert special spacing or a decorative graphic. Doing that kind of markup by hand in HTML would be tedious and would invite errors and inconsistency.

Script Translation Rules

Script translation rules can be used to perform much more complex translations.

When a script translation rule is invoked, the script is called and the return value (usually a string) is used as the result of the translation.

Script translation rules typically do the following:

	Translate the content of any child nodes.

	Get the values of attributes of the xml node being translated.

	Construct a result based on these.

Script translation rules can perform arbitrary processing with the full power of Frontier and the ODB at their disposal.

Each script is passed the address of the xml node being translated (i.e. the address in the xml parse tree) so that attribute values can be queried using blox.getAttribute()

Creating a new WSF page via the page translation rule script

Let’s look at a script for the <page> translation rule in our above example. In this case we want to make a new web page entry in a web site table in the Frontier ODB each time we see a <page></page> tag pair. This will allow us to author multiple HTML pages in a single xml source file.

Here is a basic script to do this:

on _page (adrXmlNode, nodeData)
   local (adrWebpage, childString, pagename)
   # translate the content of the xml page node
   childString = xmltr.TranslateChildren(adrXmlNode, nodeData)
   # Get the "name" attribute to use as the page name in the ODB
   pagename = blox.getAttribute(adrXmlNode, "name")
   adrWebpage = @xmltr.demos.tutorial.[pagename]
   wp.newTextObject (childString, adrWebPage)
   return  ""

When this script is called, adrXmlNode is the address of a node in the xml parse tree corresponding to a <page></page> tag pair in the xml source document. We have used an attribute of the <page> tag (“name”) which tells us how to name the new web page object.

The call to xmltr.TranslateChildren() translates any child nodes — i.e. the full content of the page between the <page> and </page> xml tags. This is a recursive process, so translation rules will be applied to all the tag patterns in the page content and the result of all this is passed back by TranslateChildren(). The result is copied into a wpText object in the web site framework (consequently this script returns an empty string since its “result” has already been dealt with).

The web page can then be rendered in the normal Frontier manner.

In practice, such a script may do other things as well by examining and acting on other attributes of the <page> tag.

The page title needs a directive

Let’s take the above example a little further by looking at the translation rule for the <page><title> tag pattern.

Recall our xml sample file which began thus:

<page name="tutorialwebpage">
   <title>This is my home page</title>

The text “This is my home page” is what we want to appear at the top of the page and also in the window title bar (i.e. within the <title> tag in the HTML header). A script translation rule for this <title> tag might look like this:

on title(adrXmlNode, nodeData)
   local (content)
   content = xmltr.TranslateChildren(adrXmlNode, nodeData)
   return "#title \"" + content + "\"\r" + \
      "<h2>" + content + "</h2>"

Here we’re doing two things:

1.		Making a directive which holds the text of the page title. This can be picked up in the site template to set the text used for the HTML <title> tag.

2.		Generating HTML to display the title at the top of the page as an HTML “h2” heading.

Remember that this translation rule will be triggered only for a <title> tag which is contained immediately within a <page> tag. The second <title> tag which occurs in our sample xml document is within a <section> tag and so will be handled by a different translation rule.

We reuse the title tag but the result is different!

Finally we will look at a translation rule for the second <title> tag — the one which occurs in the context of a <section> tag. Referring to the table of translation rules shown earlier, we can see this was stored in a wpText. Its contents might look like this:

<p><font face="arial,helvetica" color="#0000FF" size="+2">
<br><b><children/></b></font></p>

In this case we choose to translate the title of a section to a particular font size and color with a <br> for some extra space before the title. Recall that the special token <children/> is replaced with the (translated) content of the xml node — in this case the text which appears between the <title> and </title> tags.

As you can see, the <title> tag is translated differently depending on its context in the xml document. This can make markup easier. In this case we need remember only that every “component” in the xml document — page, section, subsection — can have a title and that title will be translated correctly (and differently) for each context.

Translating the xml Document

All that remains is to invoke our translation rules and view the result.

Parsing the xml

We use the blox parser to parse our xml source and generate a parse tree:

blox.textToXml(xmlText, @adrXmlTbl)

The first parameter is the xml source text (a string) and the second is the address in the ODB where we want blox to put the resulting parse tree. If no errors result from parsing the xml, we can then translate the parse tree using xmltr.

Translating the parse tree

The script which does the translation is named “TranslateTree”. You call it like this:

result = xmltr.TranslateTree(adrXmlTbl, adrTransTbl)

The first parameter is the address of an xml parse table produced from blox.textToXml() and the second parameter is the address of a translation table containing translation rules (as described earlier).

You can also provide optional parameters to TranslateTree() for more precise control over the translation process. These are detailed in the script summary.

Looking at the result

By default, the translated text is returned as the value of the TranslateTree() script.

However, this depends on the translation rules. In this tutorial example, the translation rule for the <page> tag actually intercepts the translated result for each “page” and puts it into a wpText object in the ODB. TranslateTree() in this case returns no result. For this example, the xml translated into HTML might look like this:

#title "This is my home page"
<h2>This is my home page</h2>
<p><font face="arial,helvetica" color="#0000FF" size="+2">
<br><b>Introduction</b></font></p>
<p>Page with <b>little</b> content.</p>

Rendering the Web Pages

If you’re using xmltr to translate xml to HTML, then you can place the result of the translation into one or more wpText objects in a Frontier website table. From that point you can render the pages using the standard Frontier Web Site Framework machinery.

Other Things You Can Do

Wildcards in translation rules

Sometimes a tag pattern occurs in many different contexts within a document, yet you want the same translation rule to apply for each context. An example might be an <emph> tag used for tagging “emphasized” text. We might want this always to translate to “bold” text wherever it occurs in the document. In this case we don’t want to create repeated entries for <emph> in the translation rules table for every possible context in which <emph> might occur.

This is where wildcard rules come in handy. You can place one or more wildcard rules at any level of the translation rule table and an attempt will be made to match a tag pattern with these whenever an exact match does not occur. The most deeply nested wildcard rule which partially matches the given tag pattern will be used.

For example, if we place a wildcard rule for <emph> at the top level of our translation rule table, then this can match any tag pattern which ends with <emph>, provided there is not an exact match for that tag.

You place wildcard rules (using the same format for normal translation rules) in a subtable named “_any”. This subtable effectively matches zero or more intermediate levels of tags. If this subtable is not at the top level of the translation rule table, then the “wildcard” part of the rule match will occur at that point in the tag pattern.

Default translation rules

You can specify a default translation rule which will be used for any tag pattern which matches neither a specific translation rule nor a wildcard rule.

This default translation rule should be named “_default”. You can place default rules at any level of the translation rule table. The most deeply nested default rule which partially matches the given tag pattern will be used. To specify a default translation rule which will match any otherwise unmatched tag pattern, place it at the top level of the translation rule table.

Where might this be useful? Lets say you wanted to use xmltr as a “filter” to extract certain components of the xml parse tree and discard others (maybe to build a table of contents). In this case you could create translation rules to match those tag patterns you wish to extract and use a default translation rule which translates to nothing to ignore all other tag patterns.

Another use for a default rule is during development of translation rules to catch tag patterns which don’t have translation rules and to translate these to a diagnostic message in your web page (rather than stopping the translation with a ScriptError).

Re-using a translation rule via “sameas”

Sometimes you may want the same translation rule to be used for a number of quite different tag patterns. For example, in a page containing name and address information, you might have tags for <name>, <position>, <company>, <street> and <city>, yet you might want an identical translation (e.g. the same HTML formatting) used for each.

In this case you can provide the required translation rule for just one of the tags, say <name>, and specify that the other tags just use the same translation rule as <name>. In this example, the <position> tag would have a string or wpText translation rule which looks like this:

position    sameas:<page><name>

This means that when a <position> tag is encountered (in the appropriate context), the translation rule for the tag pattern <page><name> will be used.

Such a translation rule should begin with the characters “sameas:” and be followed by the tag pattern (complete with xml-tag-style angle brackets) whose translation rule should be used.

When locating the translation rule for the specified tag pattern, the same pattern matching machinery is used as for normal translation rules. Hence the tag pattern appearing after “sameas:” can match a specific translation rule, a wildcard rule or a default rule.

Script rules can return data structures for special processing

The translation machinery of xmltr assumes that translation rules for each xml tag pair (or node in the xml parse tree) return strings which are then concatenated to form the complete translated result. However, within your own script translation rules, you can return something other than a string — such as a list or a table structure — provided you intercept this result and process it in the enclosing translation rule script.

Two utility scripts in the xmltr suite operate this way: CollectChildren() (which translates all the child nodes separately and puts the results in a Frontier table) and CollectSimilarChildren() (which translates all child nodes of a particular type separately and puts the results in a Frontier list). Their operation is detailed in the script summary.

Translation of entities

Xml entities are like constants in other languages — symbolic names for something. They are written with the notation: “&entityname;”. The blox parser makes special parse table entries for entities and xmltr provides a special mechanism for handling the translation (or expansion) of entities.

By default, xmltr just passes entities straight through, so that they appear with the same entity notation in the translated result. If you’re translating xml to HTML, this will probably be sufficient (since HTML shares the same entity notation with xml).

You can provide your own translation mechanisms for entities:

	You can provide a table named “_entity” at the top level of any translation rule table. This will be searched for an entry which matches the entity name. If a match is found then the value of that table entry is used as the entity translation.

	Alternatively you can provide a script named “_entity” at the top level of a translation rule table. Xmltr will call this script to translate each entity. Typically such a script would contain a case statement to deal with each expected entity one by one — any cases you don’t want to handle you can pass to the default entity handling script provided by xmltr, TranslateEntityDefault().

The calling sequences for these scripts are provided in the script summary.

Overriding the rule finding and processing behaviours

You can optionally supply scripts which override the default behaviour of xmltr in two areas:

	You can provide a script which overrides the built-in rule finding script.

	You can provide a script which overrides the built-in rule processing script. Use this if you want add your own special syntax for translation rules.

You need to specify these override scripts as optional parameters when you call TranslateTree().

Details are provided in the script summary.

Preserving the rule cache for faster processing

Xmltr builds a cache table which maps tag patterns to rules. This greatly speeds up the translation process for tag patterns which occur more than once in a document — and most do. By default this cache is recreated at the start of each translation (each call to TranslateTree()) and deleted afterwards.

You can preserve the rule cache between calls to TranslateTree() by providing your own rule cache table as an optional address parameter to TranslateTree(). Preserving the rule cache will speed up translations of lots of short documents.

Details are provided in the script summary.

Website built using Frontier and xmltr. Documentation also available in pdf format for offline reading. Copyright The Design Group Qld 2000. This page last updated Thu, 23 Aug 2001