Unix Programming - Language Evaluations

The Art of Unix Programming

A Small Perl Case Study: blq

The blq script is a tool for querying block lists (lists of Internet sites that have been identified as habitual sources of unsolicited bulk email, aka spam). You can find current sources at the blq project page.

blq is a good example of a small Perl script, illustrating both the strengths and weaknesses of the language. It makes intensive use of regular-expression matching. On the other hand, the Net::DNS Perl extension module it uses has to be conditionally included, because it is not guaranteed to be present in any given Perl installation.

blq is exceptionally clean and disciplined as Perl code goes, and I recommend it as an example of good style (the other Perl tools referenced from the blq project page are good examples as well). But parts of the code are unreadable unless you are familiar with very specific Perl idioms — the very first line of code, $0 =~ s!.*/!!;, is an example. While all languages have some of this kind of opacity, Perl has it worse than most.

Tcl and Python are both good for small scripts of this type, but both lack the Perl convenience features for regular-expression matching that blq uses heavily; an implementation in either would have been reasonable, but probably less compact and expressive. An Emacs Lisp implementation would have been even faster to write and more compact than the Perl one, but probably painfully slow to use.

A Large Perl Case Study: keeper

keeper is the tool used to file incoming packages and maintain both FTP and WWW index files for the huge Linux free-software archives at ibiblio. You can find sources and documentation in the search tools subdirectory of the ibiblio archive.

keeper is a good example of a medium-to-large interactive Perl application. The command-line interface is line-oriented and patterned after a specialized shell or directory editor; note the embedded help facilities. The working parts make heavy use of file and directory handling, pattern matching, and pattern-directed editing. Note the ease with which keeper generates Web pages and electronic-mail notifications from programmatic templates. Note also the use of a canned Perl module to automate walking various functions over directory trees.

At about 3300 lines, this application is probably pushing the size and complexity limit of what one should attempt in a single Perl program. Nevertheless, most of it was written in a period of six days. In C, C++, or Java it would have taken a minimum of six weeks and been extremely difficult to debug or modify after the fact. It is way too large for pure Tcl . A Python version would probably be structurally cleaner, more readable, and more maintainable — but also more verbose (especially near the pattern-matching parts). An Emacs Lisp mode could readily do the job, but Emacs is not well suited for use over a telnet link that is often slowed to a crawl by server congestion.

The Art of Unix Programming

Home