What’s Wrong with Word Processors and Why We Need Structured Editors

For many years I worked in publishing and have personally typeset, designed and supervised the production of what must amount to hundreds of books, journals and brochures using technologies beginning with manual pasteup right through to the latest version of InDesign and including markup-based systems like TEX.

One goal of publication design is to help the reader understand the structure of the document they are reading. Designers use typographic and design conventions (e.g. typeface, size and weight, use of whitespace, layout, colour, etc) to clarify document structure.

The designer’s task is made easier when software can automatically make visual appearance a function of document structure.

Styles first appeared in Microsoft Word in the mid 1980s. They weren’t specifically a structuring mechanism, but rather a way of easily controlling the formatting of elements which served the same purpose. For example, if all headings of a particular kind should have the same format, then paragraph styles made it easy to enforce such a rule.

When Word 3 for the Macintosh was released circa 1987, it came pre-loaded with a special set of heading styles which Word used to infer a hierarchical structure for the document. An outline view displayed only paragraphs tagged with these heading styles and this special editing mode could be used to rearrange the structure of the document.

With this development, heading styles assumed a structural role in Word.

However, by this time WYSIWYG word processors, such as MacWrite and the first version of Word for the Mac had already set the precedent of allowing direct formatting. Any portion of text in a document could be selected and assigned an arbitrary set of (paragraph or character level) formatting commands.

So styles were a supplementary rather than an exclusive formatting mechanism. Authors could and did mix the use of styles and direct formatting, and anyone who works in publishing workflows will tell you that this practice continues to the present day.

Therein lies a major design defect in the suitability of conventional word processors (Microsoft Word being the ubiquitous example) for preparing text for publication — simply put there are too many ways of implying structure. Something which looks like a heading may not be tagged with a heading style. Its appearance may instead be the result of direct formatting.

If you’ve worked in publishing, dealing with text supplied by other people, you probably know all about this. It is very rare to be given a document in which styles have been used as the exclusive method of indicating structure.

This permissive formatting model migrated from Word into PageMaker and QuarkExpress and subsequently InDesign. It has also been religiously copied by most word processing programs which aspire to be a “Word clone”.

Taking a Fresh Look at Editing Structured Documents

In my view what’s needed is a fresh look at the problem of preparing structured documents for publication.

What I have in mind is a tool that gives equal importance to structure and content and that supports the kind of structure that is typically required for a published document (e.g. a book, a journal article, web page content, etc). What I’ve observed over many years is that most publications can be described by a reasonably simple structure.

We also need to accept that people aren’t going to stop using Word. So, we need a way to take a messed-up Word file and “sanitise” its structure quickly and efficiently, then re-express the document in formats that most directly and unambiguously can be utilised in the publishing workflow — for example InDesign Tags for importing into InDesign, or html for a web page.

I’m also aiming to implement a model of structure which is independent of any particular markup system, yet can be expressed in a variety of markup systems.

As of early 2012, nearly ten years has been spent, on and off, developing a prototype of such a new kind of structuring and editing tool. Numerous ideas have been tried, some discarded, some further refined.

This blog will discuss what’s been learned along the way.

You might be wondering about the place of xml and xml editors in all of this. That is a topic which deserves its own blog post.

12 Feb 2025	Why Build an RTF Parser?
26 Jun 2019	Lists in Word and RTF
10 Jun 2019	Lists Within Structured Documents
05 Aug 2016	Why Don’t All RTF Parsers Recognise Styles?
04 May 2016	The Difference Between Document Structure and its Representation
19 Nov 2012	A Closer Look at Document Structuring