Using xmltr as a data filter

Applications of data filtering

Setting up translation rules which selectively process only certain parts of an xml parse table while discarding other parts, allows filtering of the data in the parse table.

In the case where xmltr is used to generate pages in a web site, part or all the content of the web site may exist as xml text which is parsed into xml parse tables and then translated into HTML. Data filtering can be applied to these xml parse tables to produce summaries such as index pages which list and provide links to other pages on the site.

As another example, one might wish to produce a list of all the links within a site or all the section headings on all the pages. In each case, these would be identified by specific xml markup and a data filter could select only that particular markup tag for processing (and compiling a list).

Setting up custom translation tables

This kind of data filtering can be performed with a simple translation table which has entries only for the tags or tag patterns you wish to process. If you are not concerned about the context of desired tags, you can use wildcard rules to select the desired tags irrespective of how they occur within the scope of other tags.

A default rule at the topmost level of the translation table can be used to catch all tag patterns which do not match the desired tags and then translate these to nothing — i.e. just ignore them.

In practice you may need a slightly more complex system than this because it is likely that the content you are wanting to select and process contains tagged material which may also require translation. For example, if you are compiling an index of titles of articles or essays, these titles may themselves contain markup to emphasize certain words. In this case you will need to provide a rule which describes how to translate this embedded markup. If you don’t provide such a rule, then the default rule will be used, and this would translate content within the embedded markup to nothing, i.e. it would be eliminated — not the desired result!

Hence you may find the following approach is required:

	When you perform data filtering, pass a list of three translation tables to TranslateTree().

	The first translation table should have entries for the tags or tag patterns you wish to process.

	The second translation table should have entries for the common tag patterns which always require the same translations — whether you are translating an entire document or doing data filtering. Tags for emphasis would fall into this category.

	The third translation table contains only a default translation rule which translates anything it matches to nothing.

The first translation table will select what you want to filter, the second will make sure any embedded tags are translated correctly, and the third will make sure everything else is eliminated.

You may find that the translation table for common tag patterns which always require the same translations can be re-used for a variety of different translation processes.

Website built using Frontier and xmltr. Documentation also available in pdf format for offline reading. Copyright The Design Group Qld 2000. This page last updated Tue, 7 Nov 2000