Paul Howson’s Website

Building a Better Structured Editor

A Closer Look at Document Structuring

Before going on to examine how a different kind of text preparation tool could make the structuring task faster and easier, lets take a closer look at this task of structuring a document.

Consider the very common case where a designer is handed a word processor document to prepare for publication. Let’s assume that the “typographical cleanup” has been done and that structuring the document is the next task.

Recall that in the previous blog post we explained why structuring is necessary — because modern publishing tools like InDesign provide the mechanism of styles to ensure that all elements with the same structural role have identical format. When we structure a document, we identify the structural role of every element (“I’m a heading”, “I’m a sub-heading”, “I’m an element of a list”, etc). We do this by applying styles.

What might our typical word processor document received by the document designer contain in the way of structural hints? Probably a mix of the following:

  • Direct formatting. Some authors don’t use word processor styles at all. Instead they use direct formatting. In this case, every single element in the document can have “local formatting overrides”. There may well be a pattern, and even a consistent one, in the use of formatting to reflect structure, but the software doesn’t know about it because the author did not use styles.
  • Partial use of styles. Some authors have an awareness that styles exist — they’ve seen the drop down menu with all those names like “heading 1”, “heading 2”, and so on, and they assume there must be something useful in this. But they don’t really understand what styles are for or why one should use in preference to direct formatting. So you will often find documents where some elements are formatted using styles, while other elements, often with the same structural role, use direct formatting.
  • A mishmash of styles from different documents. This happens through the common practice of assembling a document from other documents using copy and paste. Each source document has its own regime of styles, and when material is copied into a new document, the word processor obligingly copies across any of the styles used by the source material. A document compiled in this way can end up with a large and incompatible set of styles, some of which are used meaningfully and others which are not. Elements which share the same structural role (e.g. a particular kind of heading) may even utilise different styles because they originated in different source documents. This kind of document can end up with a great deal of “style baggage” which just serves to confuse.
  • Downright misleading use of styles when used purely as a visual formatting device. This can occur when an author, who doesn’t understand the purpose of styles, sees them as a kind of “quick format” tool. They click on styles in the dropdown menu until they find one which gets closest to the desired visual appearance, with no regard whatsoever to the structural purpose of that style. On top of that they might then add local formatting to make it look just the way they want — e.g. to make a paragraph look like a heading — even though the underlying style they applied has nothing to do with headings! One might even call this pathological use of styles.

Documents received by document designers can and do contain any and all of these uses and misuses of styles and direct formatting.

Is it little wonder therefore that the practice has arisen amongst document designers and publishers of discarding all formatting, removing all styles and converting the document to plain text. This practice “throws out the baby with the bathwater” by discarding potentially useful structural information. The plain text is then laboriously re-tagged with styles either in a word processor or, perhaps more often, directly in the page layout software.

Do we have to take such a draconian step? Would it not be better if we could salvage useful structural information from the document while discarding that which is spurious?

After all, the documents we receive often look half decent — the authors have tried to indicate the structure of the document — they’ve just done it the wrong way.

The strange thing is that there appear to be no tools designed for this task. Neither word processors nor page layout tools are designed to make this task easy.

This is one of the the reasons for the structured editor project being written about on this blog.

In the next post we’ll look at some of things which such a tool could do that would make the structuring process much easier.