|
Unix Programming - DocBook - The DocBook Toolchain
The DocBook Toolchain
Normally, what you'll do to make XHTML from your DocBook sources
is use the
xmlto(1)
front end. Your commands will look like this:
bash$ xmlto xhtml foo.xml
bash$ ls *.html
ar01s02.html ar01s03.html ar01s04.html index.html
In this example, you converted an XML-DocBook document named
foo.xml with three top-level sections into an
index page and two parts. Making one big page is just as easy:
bash$ xmlto xhtml-nochunks foo.xml
bash$ ls *.html
foo.html
Finally, here is how you make PostScript for printing:
bash$ xmlto ps foo.xml # To make PostScript
bash$ ls *.ps
foo.ps
To turn your documents into HTML or PostScript, you need an
engine that can apply the combination of DocBook DTD and a suitable
stylesheet to your document. Figure18.2 illustrates how
the open-source tools for doing this fit together.
Parsing your document and applying the stylesheet transformation
will be handled by one of three programs. The most likely one is
xsltproc
,
the parser that ships with Red Hat Linux. The other possibilities are two
Java programs,
Saxon and
Xalan.
It is relatively easy to generate high-quality XHTML from either
DocBook; the fact that XHTML is simply another XML DTD helps a lot.
Translation to HTML is done by applying a rather simple stylesheet,
and that's the end of the story. RTF is also simple to generate in
this way, and from XHTML or RTF it's easy to generate a flat ASCII
text approximation in a pinch.
The awkward case is print. Generating high-quality printed
output — which means, in practice, Adobe's
PDF
(Portable Document Format) — is difficult. Doing it right requires
algorithmically duplicating the delicate judgments of a human
typesetter moving from content to presentation level.
So, first, a stylesheet translates DocBook's structural markup
into another dialect of XML — FO (Formatting Objects). FO
markup is very much presentation-level; you can think of it as a sort
of XML functional equivalent of troff. It
has to be translated to PostScript for packaging in a PDF.
In the toolchain shipped with Red Hat
Linux, this job
is handled by a TeX macro package called
PassiveTeX
. It
translates the formatting objects generated by
xsltproc into Donald
Knuth's
TeX language.
TeX's output, known as
DVI (DeVice Independent) format, is then
massaged into PDF.
If you think this bucket chain of XML to TeX macros to DVI to
PDF sounds like an awkward kludge, you're right. It clanks, it
wheezes, and it has ugly warts. Fonts are a significant problem,
since XML and TeX and PDF have very
different models of how fonts work; also, handling
internationalization and localization is a nightmare. About the only
thing this code path has going for it is that it works.
The elegant way will be
FOP, a direct
FO-to-PostScript translator being developed by the
Apache project. With
FOP, the internationalization problem is, if not solved, at least well
confined; XML tools handle Unicode all the way through to FOP. The
mapping from Unicode glyphs to Postscript font is also strictly FOP's
problem. The only trouble with this approach is that it doesn't work
— yet. As of mid-2003, FOP is in an unfinished alpha state
— usable, but with rough edges and missing features.
Figure18.3 illustrates what the FOP toolchain looks
like.
FOP has competition. Another project called
xsl-fo-proc aims to do the same
things as FOP, but in C++ (and therefore both faster than
Java and not
relying on the Java environment). As of mid-2003,
xsl-fo-proc is in an unfinished alpha
state, not as far along as FOP.
[an error occurred while processing this directive]
|