|
Unix Programming - Language Evaluations - Perl
Perl
Perl is shell on steroids. It was specifically designed to
replace
awk(1),
and expanded to replace shell as the ‘glue’ for
mixed-language script programming. It was first released in 1987.
Perl's strongest point is its extremely powerful built-in
facilities for pattern-directed processing of textual, line-oriented
data formats; it is unsurpassed at this. It also includes far stronger
data structures than shell, including dynamic arrays of mixed element
types and a ‘hash’ or ‘dictionary’ type that
supports convenient and fast lookup of name-value pairs.
Additionally, Perl includes a rather complete and
well-thought-out internal binding of virtually the entire Unix API,
drastically reducing the need for C and making it suitable for jobs like
simple TCP/IP clients
and even servers. Another strong advantage of Perl is that a large and
vigorous open-source community has grown up around it. Its home on
the net is the Comprehensive Perl
Archive Network. Dedicated Perl hackers have written hundreds
of freely reusable Perl modules for many different programming
tasks. These include everything from structure-walking of directory
trees through X toolkits for GUI building, through excellent canned
facilities for supporting HTTP robots and CGI programming.
Perl's main drawback is that parts of it are irredeemably ugly,
complicated, and must be used with caution and in stereotyped ways
lest they bite (its argument-passing conventions for functions are a
good example of all three problems). It is harder to get started in
Perl than it is in shell. Though small programs in Perl can
be extremely powerful, careful discipline is required to maintain
modularity and keep a design under control as program size
increases. Because some limiting design decisions early in Perl's
history could not be reversed, many of the more advanced features have
a fragile, klugey feel about them.
The definitive reference on Perl is Programming
Perl [Wall2000]. This book has nearly
everything you will ever need to know in it, but is notoriously badly
organized; you will have to dig to find what you want. A more
introductory and narrative treatment is available in
Learning Perl [Schwartz-Christiansen].
Perl is universal on Unix systems. Perl scripts at the same
major release level tend to be readily portable between Unixes
(provided they don't use extension modules). Perl implementations are
available (and even well documented) for the
Microsoft family of operating systems and on
MacOS
as well. Perl/Tk provides cross-platform GUI capability.
Summing up: Perl's best side is as a power tool for small glue
scripts involving a lot of regular-expression grinding. Its worst
side is that it is ugly, spiky, and nigh-unmaintainable in large
volumes.
A Small Perl Case Study: blq
The blq script is a tool
for querying block lists (lists of Internet sites that have been
identified as habitual sources of unsolicited bulk email, aka spam).
You can find current sources at the
blq
project page.
blq is a good example of
a small Perl script, illustrating both the strengths and weaknesses of
the language. It makes intensive use of regular-expression matching.
On the other hand, the Net::DNS Perl extension module it uses has to
be conditionally included, because it is not guaranteed to be present
in any given Perl installation.
blq is exceptionally
clean and disciplined as Perl code goes, and I recommend it as an
example of good style (the other Perl tools referenced from the
blq project page are good examples as
well). But parts of the code are unreadable unless you are familiar
with very specific Perl idioms — the very first line of code,
$0 =~ s!.*/!!;, is an example. While all languages
have some of this kind of opacity, Perl has it worse than most.
Tcl and
Python are
both good for small scripts of this type, but both lack the Perl
convenience features for regular-expression matching that blq uses heavily; an implementation in
either would have been reasonable, but probably less
compact and
expressive. An Emacs Lisp implementation would have been even
faster to write and more compact than the Perl one, but probably
painfully slow to use.
A Large Perl Case Study:
keeper
keeper is the tool used to file
incoming packages and maintain both FTP and WWW index files for the
huge Linux free-software archives at ibiblio. You
can find sources and documentation in the search tools subdirectory of the ibiblio
archive.
keeper is a good example of a
medium-to-large interactive Perl application. The command-line
interface is line-oriented and patterned after a specialized shell or
directory editor; note the embedded help facilities. The working parts
make heavy use of file and directory handling, pattern matching, and
pattern-directed editing. Note the ease with which
keeper generates Web pages and
electronic-mail notifications from programmatic templates. Note also
the use of a canned Perl module to automate walking various functions
over directory trees.
At about 3300 lines, this application is probably pushing the
size and complexity limit of what one should attempt in a single Perl
program. Nevertheless, most of it was written in a period of six
days. In C,
C++, or
Java it would
have taken a minimum of six weeks and been extremely difficult to
debug or modify after the fact. It is way too large for pure
Tcl. A
Python
version would probably be structurally cleaner, more readable, and
more maintainable — but also more verbose (especially near the
pattern-matching parts). An Emacs Lisp mode could readily do the job, but
Emacs is not well suited for use over a
telnet link that is often slowed to a crawl by server
congestion.
[an error occurred while processing this directive]
|