Paul Howson’s Website tdgq.com.au

Design and Publishing Notebook

Using TEX for Scripted Generation of Beautiful Documents

If you were a franchisor seeking a reliable way to find quality franchisees, you might turn to an online tool offered by one of my clients, the Franchise Relationships Institute.

In a nutshell, the tool collects data and prepares quite complex reports as PDF files. The system was put together by programmers. Good at programming. Not so skilled at generating good looking reports. Very few programmers are also competent graphic designers.

When challenged with how to produce reports out of this system, the programmers chose the same machinery which generated the screen displays for the online system. Their reports were simply web pages coded in html. These looked half acceptable on-screen. But printing them to paper was a disaster.

Pages on the web and pages printed on paper are constructed using very different sets of rules.

An online “page” may have a nominal width — although that can be very rubbery — but usually has no limit to its length — like an endless roll of paper, it’s as long as it needs to be.

But a printed page has go on a real piece of paper with a finite length and width, typically A4 in this country. Laying out information on a printed page is a different art form than laying out online pages. The screen and the printed page are very different media.

Not surprisingly, the programmers’ reports, generated as web pages and printed to paper, looked horrible. Page breaks occurred in the worst possible places. The typography was abysmal.

The client asked me: “Was there a better way to generate a good looking PDF report?”.

It’s true that there are numerous PDF generation libraries, which you drive directly from software via an “API”. Some of these will do a half decent job at page layout.

But I wanted to use something that could be driven in a more abstract way and which could produce typographically sophisticated documents.

And I knew there was one outstanding candidate for this task — the venerable TEX typesetting system, designed in the late 1970s by Donald E. Knuth of Stanford University and widely used for typesetting technical and mathematical publications. I had already typeset a number of books using TEX, so the decision was a no-brainer.

TEX turned out to be perfect for the kind of scripted report generation required.

I’ll explain briefly how it works.

The backend of the online system collects all the data for a report and then outputs one or more data files containing TEX markup. These data files are assembled, together with static portions of the reports (already set up with TEX markup), into an set of TEX source files which are then sent to a TEX processor. We are using XETEX, which is a Unicode-aware version of TEX, tightly integrated with OpenType fonts.

The visual quality of the reports generated by this system is outstanding. And in over six years of operation, generating reports day in and day out, the XETEX software hasn’t missed a beat. It’s very fast and extremely reliable. (TEX was designed to be fast on 1980s hardware, so it’s super-fast on modern systems.)

Creating beautiful reports using TEX requires a somewhat unusual combination of skills.

Firstly one requires a sound understanding of document design and typography. Secondly, one also requires a solid grasp of programming skills and in particular a good knowledge of the rather arcane TEX macro language and the TEX “boxes and glue” model.