A great many major open-source projects are converging on
DocBook as a standard format for their documentation. The advocates
of XML-based markup seem to have won the theoretical argument against
presentation-level and for structural-level markup, and an effective
XML-DocBook toolchain is available in open source.
Nevertheless, a lot of confusion still surrounds DocBook and the
programs that support it. Its devotees speak an argot that is dense
and forbidding even by computer-science standards, slinging around
acronyms that have no obvious relationship to the things you need to
do to write markup and make HTML or PostScript from it. XML standards
and technical papers are notoriously obscure. In the rest of this
section, we'll try to dispel the fog of jargon.
Document Type Definitions
(Note: to keep the explanation simple, most of this section
tells some lies, mainly by omitting a lot of history. Truthfulness
will be fully restored in a following section.)
DocBook is a structural-level markup language. Specifically, it
is a dialect of XML. A DocBook document is a piece of XML that uses
XML tags for structural markup.
For a document formatter to apply a stylesheet to your document
and make it look good, it needs to know things about the overall
structure of your document. For example, in order to physically
format chapter headers properly, it needs to know that a book
manuscript normally consists of front matter, a sequence of chapters,
and back matter. In order for it to know this sort of thing, you
need to give it a Document Type
Definition
or DTD. The
DTD tells your formatter what sorts of elements can be in the document
structure, and in what order they can appear.
What we mean by calling DocBook a ‘dialect’ of XML
is actually that DocBook is a DTD — a rather large DTD, with
somewhere around 400 tags in it.[152]
Lurking behind DocBook is a kind of program called a
validating parser
. When you format a DocBook document, the
first step is to pass it through a validating parser (the front end of
the DocBook formatter). This program checks your document against the
DocBook DTD to make sure you aren't breaking any of the DTD's
structural rules (otherwise the back end of the formatter, the part
that applies your stylesheet, might become quite confused).
The validating parser will either throw an error, giving you messages
about places where the document structure is broken, or translate the
document into a stream of XML elements and text that the parser back
end combines with the information in your stylesheet to produce
formatted output.
Figure18.1 diagrams the whole process.
The part of the diagram inside the dotted box is your formatting
software, or toolchain. Besides the obvious and
visible input to the formatter (the document source) you'll need to
keep the two hidden inputs of the formatter (DTD and stylesheet) in
mind to understand what follows.