Characteristics to Look for in an XML Publishing System

Generally good checklist of characteristics to look for in an XML/DITA publishing system

I took a trip through the wayback machine and found an old article that had been published on Reading through it, I found a lot of key points that are still very relevant today. No single product is for every situation, but these guidelines are generally good advice. We thought we’d republish it here. So,¬†enjoy your trip through the wayback machine and let us know if you think this is still good advice today for anyone looking for a DITA/XML publishing system!

Characteristics to Look for in an XML/SGML Publishing System

Version published: 1999-10-11 on by Arbortext, Inc.

The metalanguages XML (eXtensible Markup Language) and SGML (Standard Generalized Markup Language) are the crucial ingredients in next-generation publishing systems that meet the needs of organizations with many writers or with large amounts of information.

Publishing systems based on XML/SGML share these four common characteristics:

  • Compliance to XML/SGML standards
  • Modularity
  • Automatic composition
  • Adaptability

Compliance to XML/SGML standards

Adopting XML or SGML makes sense if you want to achieve high levels of automation and vendor independence. To truly attain those goals, however, your systems must strictly comply with all aspects of the XML/SGML standard at all times.

Some of the indications of a strictly compliant system include:

  • Information is originally authored in XML or SGML
  • XML or SGML is the default file format
  • All XML/SGML capabilities, including advanced constructs, are fully supported

The following paragraphs explain each of these indicators in detail.

Highly automated publishing systems require strict XML/SGML compliance in the same way that accounting systems require continuously valid databases. In each case, you cannot build a reliable system on top of unreliable data. To achieve the best performance, your system should maintain your information in XML/SGML both on disk (XML or SGML file format) and in memory.

Some early attempts to build next-generation publishing systems relied on converting to SGML from word processor or desktop publishing files. These products, which were designed to provide tremendous flexibility with layout and typesetting, provide nothing to ensure valid data. As a result, conversions involved significant human intervention, including both error resolution and inspection.

Every organization that has deployed a state-of-the-art publishing system has either already implemented database storage of their documents or has definite plans to do so. Virtually all agree that the only sensible storage format for their documents is XML/SGML; storing files in proprietary unstructured formats simply costs too much and poses too many barriers to automation.

If you store your information in XML/SGML, then you must implement tools that can work directly with that XML/SGML information. Converting between XML/SGML and proprietary formats does not work for complex documents; in fact, no robust XML/SGML information can survive the round trip intact from XML/SGML to a proprietary format back to XML/SGML.


Creating and managing a large document as a single file or series of files poses many challenges. Primary among these are the barriers to reusing the same information in multiple documents, and the inefficient workflow caused by a large project that cannot be divided into smaller tasks.

Authors seldom reuse information written by others since they either don’t know it exists or cannot easily find its location. Authors willingly reuse their own information by copying from one document and pasting into another. From then on, however, the cost of revising that information climbs along with the number of different documents in which it appears. The risk of inaccuracy also rises, since the author can easily overlook some documents during the revision process.

The process of reviewing and translating large documents costs more time and money that it should. Reviewing can’t begin until the author finishes a complete first draft, and translating can’t begin until the author finishes a complete final draft. New revisions of old documents can be extremely expensive, since the reviewers and translators work through both the changed and unchanged parts of the document.

Both of these problems can be reduced or eliminated through a modular approach to documentation, where the document consists of many separate pieces (which we call “document objects”) that can be created, reviewed and translated separately. To support reuse, the system must provide easy ways to browse existing document objects, point to the object to be reused, track revisions, and provide notification of changes.

Here’s a tip: a document object should contain just enough information so that the object is worth handling as a separate piece.

Automatic Composition

Since the advent of WYSIWYG (What You See Is What You Get) desktop publishing tools, authors have gained great personal satisfaction from manually adjusting the layout of each page until it looks good to them. Because many authors continually adjust the layout as they write new copy or revise existing copy, the proportion of time they spend on adjusting the layout is difficult to measure. Companies that have measured this time, however, estimate that as much as 50% of an author’s time is spent on page layouts.

As organizations strive to produce a greater variety of publications, with greater frequency, and for both U.S. and European paper sizes, the cost of using authors to lay out pages can escalate out of control. Many managers have come to realize not only the excessive cost of laying out pages manually, but also the colossal waste of valuable talent on simple work.

The solution, of course, is to let the computer do page layouts and free the authors to spend 100% of their time creating new information. Solving this problem alone can easily justify the cost of a brand new system for creating, managing, and publishing textual information.


Large organizations typically select tools with tremendous adaptability, because they enjoy the maximum return on modifications that improve their processes, increase their productivity, and lower their costs.

At a minimum, software tools for next-generation publishing systems adapt to these three key needs:

  1. Author productivity: The software should be adaptable to the menu layouts and key assignments that best meet the authors’ needs. In addition, the software should support the development of powerful functions that automate routine tasks and simplify complex tasks.
  2. Additional automation: The software should provide the flexibility to perform various data processing functions, such as validating entries, calculating values, sorting, and similar operations.
  3. Integration: The software should support tight integration with other crucial components of a complete system, such as document management software, workflow software, and databases, so that operation is seamless to the users.

Get useful tips and valuable resources every month

Join the thousands who know just how much we share.

Powered by ConvertKit

Author: Liz Fraley

Liz Fraley