|
Unix Programming - The Importance of Being Textual - Case Study: .newsrc Format
Case Study: .newsrc Format
Usenet news is
a worldwide distributed bulletin-board system that anticipated today's
P2P networking by two decades. It uses a message format very similar
to that of RFC822 electronic-mail messages, except that instead of
being directed to personal recipients messages are sent to topic
groups. Articles posted at any participating site are broadcast to
each site that it has registered as a neighbor, and eventually
flood-fill to all news sites.
Almost all Usenet news readers understand the
.newsrc file, which records which Usenet messages
have been seen by the calling user. Though it is named like a
run-control file, it is not only read at startup but typically updated
at the end of the newsreader run. The .newsrc
format has been fixed since the first newsreaders around 1980. Example5.2 is a representative section from a
.newsrc file.
Example5.2.A .newsrc example.
rec.arts.sf.misc! 1-14774,14786,14789
rec.arts.sf.reviews! 1-2534
rec.arts.sf.written: 1-876513
news.answers! 1-199359,213516,215735
news.announce.newusers! 1-4399
news.newusers.questions! 1-645661
news.groups.questions! 1-32676
news.software.readers! 1-95504,137265,137274,140059,140091,140117
alt.test! 1-1441498
Each line sets properties for the newsgroup named in the first
field. The name is immediately followed by a character that indicates
whether the owning user is currently subscribed to the group or not; a colon
indicates subscription, and an exclamation mark indicates
nonsubscription. The remainder of the line is a sequence of
comma-separated article numbers or ranges of article numbers,
indicating which articles the user has seen.
Non-Unix programmers might have automatically tried to design a
fast binary format in which each newsgroup status was described by
either a long but fixed-length binary record, or a sequence of
self-describing binary packets with internal length fields. The main
point of such a binary representation would be to express ranges with
binary data in paired word-length fields, in order to avoid the
overhead of parsing all the range expressions at startup.
Such a layout could be read and written faster than a textual
format, but it would have other problems. A nave implementation
in fixed-length records would have placed artificial length limits on
newsgroup names and (more seriously) on the maximum number of ranges
of seen-article numbers. A more sophisticated binary-packet format
would avoid the length limits, but could not be edited with the user's
eyeballs and fingers — a capability that can be quite useful
when you want to reset just some of the read bits in an individual
newsgroup. Also, it would not necessarily be portable to different
machine types.
The designers of the original newsreader chose
transparency and interoperability over
economy. The case for going in the other direction was not completely
ridiculous; .newsrc files can get very large, and
one modern reader (GNOME's Pan) uses a speed-optimized private format
to avoid startup lag. But to other implementers, textual
representation looked like a good tradeoff in 1980, and has looked
better as machines increased in speed and storage dropped in
price.
[an error occurred while processing this directive]
|