Plan 9 cleans up Unix, but only really adds one new concept
(private namespaces) to its basic set of design ideas. But are there
serious problems with those basic design ideas? In Chapter1 we touched on
several issues that Unix arguably got wrong. Now that the open-source
movement has put the design future of Unix back in the hands of
programmers and technical people, these are no longer decisions we
have to live with forever. We'll reexamine them in order to get a
better handle on how Unix might evolve in the future.
A Unix File Is Just a Big Bag of Bytes
A Unix file is just a big bag of bytes, with no other
attributes. In particular, there is no capability to store
information about the file type or a pointer to an associated
application program outside the file's actual data.
More generally, everything is a byte stream; even hardware devices are byte streams.
This metaphor was a tremendous success of early Unix, and a real
advance over a world in which (for example) compiled programs could
not produce output that could be fed back to the compiler.
Pipes and shell
programming
sprang from this metaphor.
But Unix's byte-stream metaphor is
so
central that Unix has trouble integrating software objects with
operations that don't fit neatly into the byte stream or file
repertoire of operations (create, open, read, write, delete). This is
especially a problem for GUI objects such as icons, windows, and
‘live’ documents. Within a classical Unix model of the
world, the only way to extend the everything-is-a-byte-stream metaphor
is through ioctl calls, a notoriously ugly
collection of back doors into kernel space.
Fans of the Macintosh family of operating systems tend to be
vociferous about this. They advocate a model in which a single
filename may have both data and resource ‘forks’, the data
fork corresponding to the Unix byte stream and the resource fork being
a collection of name/value pairs. Unix partisans prefer approaches
that make file data self-describing so that effectively the same sort
of metadata is stored within the file.
The problem with the Unix approach is that every program
that writes the file has to know about it. Thus, for example, if we
want the file to carry type information inside it, every tool that
touches it has to take care to either preserve the type field
unaltered or interpret and then rewrite it. While this would be
theoretically possible to arrange, in practice it would be far too
fragile.
On the other hand, supporting file attributes raises awkward
questions about which file operations should preserve them. It's
clear that a copy of a named file to another name should copy
the source file's attributes as well as its data — but suppose
we
cat(1)
the file, redirecting the output of
cat(1)
to a new name?
The answer to this question depends on whether the attributes
are actually properties of filenames or are in some magical way
bundled with the file's data as a sort of invisible preamble or
postamble. Then the question becomes: Which operations make the
properties visible?
Xerox PARC
file-system designs grappled with this problem as far back as the
1970s. They had an ‘open serialized’ call that returned a
byte stream containing both attributes and content. If applied to a
directory, it returned a serialization of the directory's attributes
plus the serialization of all the files in it. It is not clear that
this approach has ever been bettered.
Linux 2.5 already supports attaching arbitrary
name/value pairs as properties of a filename, but at time of writing
this capability is not yet much used by
applications. Recent versions of Solaris have a roughly equivalent
feature.