<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Paul Howson’s Website: Structured Editing</title>
    <link>https://tdgq.com.au/structured-editing/</link>
    <description>Recent content in Structured Editing on Paul Howson’s Website</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 12 Feb 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://tdgq.com.au/structured-editing/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Why Build an RTF Parser?</title>
      <link>https://tdgq.com.au/structured-editing/why-build-an-rtf-parser/</link>
      <pubDate>Wed, 12 Feb 2025 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/why-build-an-rtf-parser/</guid>
      <description>&lt;section class=&#34;abstract&#34;&gt;
&lt;p&gt;Part 1 of a series of posts on the topic of building an RTF parser in Ruby.&lt;/p&gt;
&lt;/section&gt;
&lt;p&gt;The genesis for writing an RTF parser, starting in the early 2000s, was my earlier work in computer-based graphic design and publishing.&lt;/p&gt;
&lt;p&gt;Consistently formatting a document becomes much simpler and more reliable when there is a way to “centrally control” formatting.&lt;/p&gt;
&lt;p&gt;The possibility of doing this began with the introduction of paragraph styles in Microsoft Word version 3 dating from the late 1980s &lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Lists in Word and RTF</title>
      <link>https://tdgq.com.au/structured-editing/lists-in-word-and-rtf/</link>
      <pubDate>Wed, 26 Jun 2019 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/lists-in-word-and-rtf/</guid>
      <description>&lt;p&gt;In this post we will examine how lists work in Microsoft Word and RTF.&lt;/p&gt;
&lt;p&gt;As mentioned in an earlier post, RTF development was intimately tied to the developing capabilities of Microsoft Word. Hence we find that the RTF specification provides us with a window into Word’s document model.&lt;/p&gt;
&lt;h3 id=&#34;how-lists-work-in-word-97-and-later-versions&#34;&gt;How Lists Work in Word 97 and Later Versions&lt;/h3&gt;
&lt;p&gt;Word 97 (W97) introduced the sophisticated list behaviours which have carried through to the present day, with some additional embellishments.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Lists Within Structured Documents</title>
      <link>https://tdgq.com.au/structured-editing/lists-within-structured-documents/</link>
      <pubDate>Mon, 10 Jun 2019 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/lists-within-structured-documents/</guid>
      <description>&lt;p&gt;This post begins a series on the topic of lists within structured documents. Lists are an indispensable part of most documents. Lists can take different forms — unordered lists (e.g. bullet lists), ordered lists (e.g. numbered lists) and nested combinations of lists. There are significant differences in how lists are handled across different authoring tools and how they are represented in different markup languages (using this term very broadly).&lt;/p&gt;
&lt;p&gt;We will examine lists in:&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Why Don’t All RTF Parsers Recognise Styles?</title>
      <link>https://tdgq.com.au/structured-editing/why-dont-all-rtf-parsers-recognise-styles/</link>
      <pubDate>Fri, 05 Aug 2016 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/why-dont-all-rtf-parsers-recognise-styles/</guid>
      <description>&lt;section class=&#34;abstract&#34;&gt;

&lt;p&gt;The RTF document format grew in parallel with Microsoft Word. Parsing and interpreting formatting controls when styles are mixed with direct formatting can be a challenge. Perhaps that’s why many RTF parsers ignore styles and their value as a structural device.&lt;/p&gt;

&lt;/section&gt;

&lt;p&gt;RTF is a widely supported document encoding format which grew out of and closely paralleled the evolution of Microsoft Word. According to Wikipedia:&lt;/p&gt;

&lt;blockquote&gt;

&lt;p&gt;Richard Brodie, Charles Simonyi, and David Luebbert, members of the Microsoft Word development team, developed the original RTF in the middle to late 1980s. Its syntax was influenced by the TeX typesetting language. The first RTF reader and writer shipped in 1987 as part of Microsoft Word 3.0 for Macintosh, which implemented the RTF version 1.0 specification. All subsequent releases of Microsoft Word for the Macintosh and all versions for Windows can read and write files in RTF format.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>The Difference Between Document Structure and its Representation</title>
      <link>https://tdgq.com.au/structured-editing/difference-between-document-structure-and-representation/</link>
      <pubDate>Wed, 04 May 2016 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/difference-between-document-structure-and-representation/</guid>
      <description>&lt;p&gt;People talk about documents by saying things like: “this is a plain text document”, “this is an XML document”, “this is an HTML document”, “this is a Word document”, “this is a Markdown document”.&lt;/p&gt;

&lt;p&gt;These phrases describe the way the document has been serialised as a stream of bytes in a file — i.e. they describe the file format. But a document has more abstract qualities than just its file format.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>A Closer Look at Document Structuring</title>
      <link>https://tdgq.com.au/structured-editing/a-closer-look-at-document-structuring/</link>
      <pubDate>Mon, 19 Nov 2012 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/a-closer-look-at-document-structuring/</guid>
      <description>&lt;p&gt;Before going on to examine how a different kind of text preparation tool could make the structuring task faster and easier, lets take a closer look at this task of &lt;em&gt;structuring a document&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Consider the very common case where a designer is handed a word processor document to prepare for publication. Let’s assume that the “typographical cleanup” has been done and that structuring the document is the next task.&lt;/p&gt;

&lt;p&gt;Recall that in the previous blog post we explained why structuring is necessary — because modern publishing tools like InDesign provide the mechanism of &lt;em&gt;styles&lt;/em&gt; to ensure that all elements with the same structural role have identical format. When we structure a document, we identify the structural role of every element (“I’m a heading”, “I’m a sub-heading”, “I’m an element of a list”, etc). We do this by applying styles.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Preparing Text for Publication is Different from Authoring — and Needs Special Tools</title>
      <link>https://tdgq.com.au/structured-editing/preparing-text-for-publication-needs-special-tools/</link>
      <pubDate>Tue, 18 Sep 2012 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/preparing-text-for-publication-needs-special-tools/</guid>
      <description>&lt;p&gt;In this post I want to suggest that a publication designer needs a different kind of text preparation tool than the typical “word processor” used by authors.&lt;/p&gt;

&lt;h3&gt;Old and New Ways&lt;/h3&gt;

&lt;p&gt;In the past, the craftsman typesetter received a handwritten or typed manuscript from an author or editor and was able to exert complete control over the typesetting process — translating pencilled markup into a typeset text galley embodying the arcane rules of high quality typography.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>A Conversation with Fredrik, a Document Engineer from the Netherlands</title>
      <link>https://tdgq.com.au/structured-editing/a-conversation-with-fredrik-a-document-engineer-from-the-netherlands/</link>
      <pubDate>Tue, 31 Jul 2012 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/a-conversation-with-fredrik-a-document-engineer-from-the-netherlands/</guid>
      <description>&lt;p&gt;Fredrik from Utrecht in the Netherlands wrote to me in 2010 about publishing workflows. His experiences are similar to mine, so I quote from his email and add a commentary. I had explained to Fredrik about my background in publishing and talked about the typical problems experienced with publishing workflows. Fredrik’s comments are shown indented below:&lt;/p&gt;

&lt;blockquote&gt;My background is also in publishing, so I indeed know the problems you are talking about. I have worked with a number of different workflows, automated formatting and manual formatting, xml as source and Word as source. With all the solutions that used Word I saw problems, and typing xml is too much to ask for. I have seen cases where the authors write their articles in Word and others are hired to manually convert it into xml. It may well be the best option currently, but it should be able to improve the situation using a better editor.&lt;/blockquote&gt;

&lt;p&gt;I can only agree regarding Word and xml — the shortcomings of both are discussed in other posts on this blog.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Scotty Interviews Me About Programming and RealBasic to MacRuby Translation</title>
      <link>https://tdgq.com.au/structured-editing/scotty-interviews-me/</link>
      <pubDate>Tue, 05 Jun 2012 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/scotty-interviews-me/</guid>
      <description>&lt;p&gt;Sometime around May 2010 I was interviewed by British software developer “Scotty”, the genial host of the MDN Show. MDN stands for the Macintosh Developer Network and the MDN Show was a podcast about programming on the Macintosh. The show had originally started as “Late Night Cocoa” some years earlier, when each show consisted of an extended interview with one software developer. The MDN Show in contrast had more of a “magazine” format.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>What’s Wrong with Word Processors and Why We Need Structured Editors</title>
      <link>https://tdgq.com.au/structured-editing/what-s-wrong-with-word-processors-and-why-we-need-structured-editors/</link>
      <pubDate>Wed, 16 May 2012 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/what-s-wrong-with-word-processors-and-why-we-need-structured-editors/</guid>
      <description>&lt;p&gt;For many years I worked in publishing and have personally typeset, designed and supervised the production of what must amount to hundreds of books, journals and brochures using technologies beginning with manual pasteup right through to the latest version of InDesign and including markup-based systems like TEX.&lt;/p&gt;

&lt;p&gt;One goal of publication design is to help the reader understand the structure of the document they are reading. Designers use typographic and design conventions (e.g. typeface, size and weight, use of whitespace, layout, colour, etc) to clarify document structure.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>About this Blog</title>
      <link>https://tdgq.com.au/structured-editing/about/</link>
      <pubDate>Sun, 01 Jan 2012 00:00:00 +0000</pubDate>
      
      <guid>https://tdgq.com.au/structured-editing/about/</guid>
      <description>&lt;p&gt;Having spent many years using computers to prepare publications, it became obvious that the widely used word processors were less than ideal tools for preparing text for publication. As I thought more about this, ideas for a new kind of structured document editing tool came together in my mind. This project is an attempt to embody those ideas in a piece of software.&lt;/p&gt;

&lt;h3&gt;The Concept of the Structured Editor&lt;/h3&gt;

&lt;p&gt;Text files received by print or web designers often need a lot of cleaning up. And part of that cleaning up pertains to document structure. There can be lots of spurious direct formatting overrides at both the paragraph and character level. Some of these are structurally significant, but many are junk.&lt;/p&gt;</description>
    </item>
    
  </channel>
</rss>