|
Unix Programming - Data-Driven Programming - Case Study: Metaclass Hacking in fetchmailconf
Case Study: Metaclass Hacking in fetchmailconf
The
fetchmailconf(1)
dotfile configurator shipped with
fetchmail(1)
contains an instructive example of advanced data-driven programming in
a very high-level, object-oriented language.
In October 1997 a series of questions on the fetchmail-friends
mailing list made it clear that end-users were having increasing
troubles generating configuration files for
fetchmail. The file uses a simple,
classically-Unixy free-format syntax, but can become forbiddingly
complicated when a user has POP3 and IMAP accounts at multiple sites.
See Example9.1 for a somewhat simplified
version of the fetchmail author's
configuration file.
Example9.1.Example of fetchmailrc syntax.
set postmaster "esr"
set daemon 300
poll imap.ccil.org with proto IMAP and options no dns
aka snark.thyrsus.com locke.ccil.org ccil.org
user esr there is esr here
options fetchall dropstatus warnings 3600
poll imap.netaxs.com with proto IMAP
user "esr" there is esr here options dropstatus warnings 3600
The design objective of fetchmailconf
was to completely hide the control file syntax behind a fashionable,
ergonomically-correct GUI replete with selection buttons,
slider bars and fill-out forms. But the beta design had a problem: it
could easily generate configuration files from the user's GUI actions,
but could not read and edit existing ones.
The parser for
fetchmail's
configuration file syntax is rather elaborate. It's actually written
in yacc and lex,
the two classic Unix tools for generating language-parsing code in C.
For fetchmailconf to be able to edit
existing configuration files, it at first appeared that it would be
necessary to replicate that elaborate parser in fetchmailconf's
implementation language — Python.
This tactic seemed doomed. Even leaving aside the amount of
duplicative work implied, it is notoriously hard to be certain that
two parsers in two different languages accept the same grammar.
Keeping them synchronized as the configuration language evolved bid
fair to be a maintenance nightmare. It would have violated the SPOT
rule we discussed
in Chapter4
wholesale.
This problem stumped me for a while. The insight that cracked
it was that fetchmailconf could use
fetchmail's own parser as a filter! I
added a --configdump option to
fetchmail that would parse
.fetchmailrc and dump the result to standard
output in the format of a Python initializer. For the file above, the
result would look roughly like Example9.2 (to save
space, some data not relevant to the example is omitted).
Example9.2.Python structure dump of a fetchmail configuration.
fetchmailrc = {
'poll_interval':300,
"logfile":None,
"postmaster":"esr",
'bouncemail':TRUE,
"properties":None,
'invisible':FALSE,
'syslog':FALSE,
# List of server entries begins here
'servers': [
# Entry for site `imap.ccil.org' begins:
{
"pollname":"imap.ccil.org",
'active':TRUE,
"via":None,
"protocol":"IMAP",
'port':0,
'timeout':300,
'dns':FALSE,
"aka":["snark.thyrsus.com","locke.ccil.org","ccil.org"],
'users': [
{
"remote":"esr",
"password":"masked_one",
'localnames':["esr"],
'fetchall':TRUE,
'keep':FALSE,
'flush':FALSE,
"mda":None,
'limit':0,
'warnings':3600,
}
, ]
}
,
# Entry for site `imap.netaxs.com' begins:
{
"pollname":"imap.netaxs.com",
'active':TRUE,
"via":None,
"protocol":"IMAP",
'port':0,
'timeout':300,
'dns':TRUE,
"aka":None,
'users': [
{
"remote":"esr",
"password":"masked_two",
'localnames':["esr"],
'fetchall':FALSE,
'keep':FALSE,
'flush':FALSE,
"mda":None,
'limit':0,
'warnings':3600,
}
, ]
}
,
]
}
The major hurdle had been leapt. The Python interpreter could
then evaluate the fetchmail
--configdump output and read the configuration
available to fetchmailconf as the value of
the variable ‘fetchmail’.
But this wasn't quite the last obstacle in the race. What was
really needed wasn't just for fetchmailconf
to have the existing configuration, but to turn it into a linked tree
of live objects. There would be three kinds of objects in this tree:
Configuration (the top-level object representing the
entire configuration), Site (representing one of the
servers to be polled), and User (representing user data
attached to a site). The example file describes three site objects, each
with one user object attached to it.
The three object classes already existed in
fetchmailconf. Each had a method that
caused it to pop up a GUI edit panel to modify its instance data. The
last remaining problem was to somehow transform the static data in this
Python initializer into live objects.
I considered writing a glue layer that would explicitly
know about the structure of all three classes and use that knowledge
to grovel through the initializer creating matching objects, but
rejected that idea because new class members were likely to be added
over time as the configuration language grew new features. If the
object-creation code were written in the obvious way, it would once
again be fragile and tend to fall out of synchronization when either
the class definitions or the initializer structure dumped by the
--configdump report generator changed. Again, a
recipe for endless bugs.
The better way would be data-driven programming — code
that would analyze the shape and members of the initializer, query the
class definitions themselves about their members, and then
impedance-match the two sets.
Lisp and
Java
programmers call this introspection; in
some other object-oriented languages it's called
metaclass
hacking
and is generally considered fearsomely esoteric,
deep black magic. Most object-oriented languages don't support it at
all; in those that do (Perl and Java among them), it tends to be a
complicated and fragile undertaking. But Python's facilities for
introspection and metaclass hacking are unusually accessible.
See Example9.3 for the solution code, from
near line 1895 of the 1.43 version.
Example9.3.copy_instance metaclass code.
def copy_instance(toclass, fromdict):
# Make a class object of given type from a conformant dictionary.
class_sig = toclass.__dict__.keys(); class_sig.sort()
dict_keys = fromdict.keys(); dict_keys.sort()
common = set_intersection(class_sig, dict_keys)
if 'typemap' in class_sig:
class_sig.remove('typemap')
if tuple(class_sig) != tuple(dict_keys):
print "Conformability error"
# print "Class signature: " + `class_sig`
# print "Dictionary keys: " + `dict_keys`
print "Not matched in class signature: "+ \
`set_diff(class_sig, common)`
print "Not matched in dictionary keys: "+ \
`set_diff(dict_keys, common)`
sys.exit(1)
else:
for x in dict_keys:
setattr(toclass, x, fromdict[x])
Most of this code is error-checking against the possibility that
the class members and --configdump report generation
have drifted out of synchronization. It ensures that if the code
breaks, the breakage will be detected early — an implementation
of the Rule of Repair. The heart of this function is the last
two lines, which set attributes in the class from corresponding
members in the dictionary. They're equivalent to this:
def copy_instance(toclass, fromdict):
for x in fromdict.keys():
setattr(toclass, x, fromdict[x])
When your code is this simple, it is far more likely to be right.
See Example9.4 for the code that calls it.
Example9.4.Calling context for
copy_instance.
# The tricky part - initializing objects from the `configuration'
# global. `Configuration' is the top level of the object tree
# we're going to mung
Configuration = Controls()
copy_instance(Configuration, configuration)
Configuration.servers = [];
for server in configuration['servers']:
Newsite = Server()
copy_instance(Newsite, server)
Configuration.servers.append(Newsite)
Newsite.users = [];
for user in server['users']:
Newuser = User()
copy_instance(Newuser, user)
Newsite.users.append(Newuser)
The key point to extract from this code is that it traverses the
three levels of the initializer (configuration/server/user),
instantiating the correct objects at each level into lists contained
in the next object up. Because copy_instance is
data-driven and completely generic, it can be used on all three levels
for three different object types.
This is a new-school sort of example; Python was not even
invented until 1990. But it reflects themes that go back to 1969 in
the Unix tradition. If meditating on Unix programming as practiced by
his predecessors had not taught me constructive laziness
— insisting on reuse, and refusing to write duplicative glue
code in accordance with the SPOT rule—I might have rushed into coding a
parser in Python. The first key insight that
fetchmail itself could be made into
fetchmailconf's configuration parser might
never have happened.
The second insight (that copy_instance could be generic) proceeded from
the Unix tradition of looking assiduously for ways to avoid
hand-hacking. But more specifically, Unix programmers are very used to
writing parser specifications to generate parsers for processing
language-like markups; from there it was a short step to believing
that the rest of the job could be done by some kind of generic
tree-walk of the configuration structure. Two separate stages of
data-driven programming, one building on the other, were needed to
solve the design problem cleanly.
Insights like this can be extraordinarily powerful. The code we
have been looking at was written in about ninety minutes, worked the
first time it was run, and has been stable in the years since (the
only time it has ever broken is when it threw an exception in the
presence of genuine version skew). It's less than forty lines and
beautifully simple. There is no way that the nave approach of
building an entire second parser could possibly have produced this
kind of maintainability, reliability or
compactness.
Reuse, simplification, generalization,
orthogonality:
this is the Zen of Unix
in action.
In Chapter10,
we'll examine the run-control syntax of
fetchmail
as an example of the standard shell-like metaformat for run-control
files. In Chapter14
we'll use fetchmailconf as an example of
Python's strength in rapidly building GUIs.
[an error occurred while processing this directive]
|