|
Unix Programming - Data File Metaformats - XML
XML
XML is a very simple syntax resembling HTML —
angle-bracketed tags and ampersand-led literal sequences. It is about
as simple as a plain-text markup can be and yet express recursively
nested data structures. XML is just a low-level syntax; it requires
a document type definition (such as XHTML) and associated application
logic to give it semantics.
XML is well suited for complex data formats (the sort of things
for which the old-school Unix tradition would use an RFC-822-like stanza
format) though overkill for simpler ones. It is especially
appropriate for formats that have a complex nested or recursive
structure of the sort that the RFC 822 metaformat does not handle well.
For a good introduction to the format, see XML in a
Nutshell [Harold-Means].
|
Among the hardest things to get right in designing any text file
format are issues of quoting, whitespace and other low-level syntax
details. Custom file formats often suffer from slightly broken syntax
that doesn't quite match other similar formats. Using a standard format
such as XML, which is verifiable and parsed by a standard library,
eliminates most of these issues.
|
|
| --
Keith Packard
|
|
Example5.5 is a simple example of an
XML-based configuration file. It is part of the
kdeprint tool shipped with the open-source
KDE office suite hosted under Linux. It describes options for an
image-to-PostScript filtering operation, and how to map them into
arguments for a filter command. For another instructive example, see
the discussion of Glade in Chapter8.
Example5.5.An XML example.
<?xml version="1.0"?>
<kprintfilter name="imagetops">
<filtercommand
data="imagetops %filterargs %filterinput %filteroutput" />
<filterargs>
<filterarg name="center"
description="Image centering"
format="-nocenter" type="bool" default="true">
<value name="true" description="Yes" />
<value name="false" description="No" />
</filterarg>
<filterarg name="turn"
description="Image rotation"
format="-%value" type="list" default="auto">
<value name="auto" description="Automatic" />
<value name="noturn" description="None" />
<value name="turn" description="90 deg" />
</filterarg>
<filterarg name="scale"
description="Image scale"
format="-scale %value"
type="float"
min="0.0" max="1.0" default="1.000" />
<filterarg name="dpi"
description="Image resolution"
format="-dpi %value"
type="int" min="72" max="1200" default="300" />
</filterargs>
<filterinput>
<filterarg name="file" format="%in" />
<filterarg name="pipe" format="" />
</filterinput>
<filteroutput>
<filterarg name="file" format="> %out" />
<filterarg name="pipe" format="" />
</filteroutput>
</kprintfilter>
One advantage of XML is that it is often possible to
detect ill-formed, corrupted, or incorrectly generated data through a
syntax check, without knowing the semantics of the data.
The most serious problem with XML is that it doesn't play well
with traditional Unix tools. Software that wants to read an XML
format needs an XML parser; this means bulky, complicated programs.
Also, XML is itself rather bulky; it can be difficult to see the
data amidst all the markup.
One application area in which XML is clearly winning is in
markup formats for document files (we'll have more to say about this
in Chapter18).
Tagging in such documents tends to be relatively sparse among large
blocks of plain text; thus, traditional Unix tools still work fairly
well for simple text searches and transformations.
One interesting bridge between these worlds is PYX format
— a line-oriented translation of XML that can be hacked with
traditional line-oriented Unix text tools and then losslessly
translated back to XML. A Web search for “Pyxie” will
turn up resources. The xmltk toolkit takes the opposite tack,
providing stream-oriented tools analogous to
grep(1)
and
sort(1)
for filtering XML documents; Web search for
“xmltk” to find it.
XML can be a simplifying choice or a complicating one. There is
a lot of hype surrounding it, but don't become a fashion victim by either
adopting or rejecting it uncritically. Choose carefully and bear the KISS
principle in mind.
[an error occurred while processing this directive]
|