Unix Programming - How to Choose among the Methods

How to Choose among the Methods

We've looked in turn at system and user run-control files, at environment variables, and at command-line arguments. Observe the progression from least easily changed to most easily changed. There is a strong convention that well-behaved Unix programs that use more than one of these places should look at them in the order given, allowing later settings to override earlier ones (there are specific exceptions, such as command-line options that specify where a dotfile should be found).

In particular, environment settings usually override dotfile settings, but can be overridden by command-line options. It is good practice to provide a command-line option like the -e of make(1) that can override environment settings or declarations in run-control files; that way the program can be scripted with well-defined behavior regardless of the way the run-control files look or environment variables are set.

Which of these places you choose to look at depends on how much persistent configuration state your program needs to keep around between invocations. Programs designed mainly to be used in a batch mode (as generators or filters in pipelines, for example) are usually completely configured with command-line options. Good examples of this pattern include ls(1), grep(1) and sort(1). At the other extreme, large programs with complicated interactive behavior may rely entirely on run-control files and environment variables, and normal use involves few command-line options or none at all. Most X window managers are a good example of this pattern.

(Unix has the capability for the same file to have multiple names or ‘links’. At startup time, every program has available to it the filename through which it was called. One other way to signal to a program that has several modes of operation which one it should come up in is to give it a link for each mode, have it find out which link it was called through, and change its behavior accordingly. But this technique is generally considered unclean and seldom used.)

Let's look at a couple of programs that gather configuration data from all three places. It will be instructive to consider why, for each given piece of configuration data, it is collected as it is.

Case Study: fetchmail

The fetchmail program uses only two environment variables, USER and HOME. These variables are in the predefined set initialized by the system; many programs use them.

The value of HOME is used to find the dotfile .fetchmailrc, which contains configuration information in a fairly elaborate syntax obeying the shell-like lexical rules described above. This is appropriate because, once it has been initially set up, Fetchmail's configuration will change only infrequently.

There is neither an /etc/fetchmailrc nor any other systemwide file specific to fetchmail. Normally such files hold configuration that's not specific to an individual user. fetchmail does use a small set of properties with this kind of scope — specifically, the name of the local postmaster, and a few switches and values describing the local mail transport setup (such as the port number of the local SMTP listener). In practice, however, these are seldom changed from their compiled-in default values. When they are changed, they tend to be modified in user-specific ways. Thus, there has been no demand for a systemwide fetchmail run-control file.

Fetchmail can retrieve host/login/password triples from a .netrc file. Thus, it gets authenticator information in the least surprising way.

Fetchmail has an elaborate set of command-line options, which nearly but do not entirely replicate what the .fetchmailrc can express. The set was not originally large, but grew over time as new constructs were added to the .fetchmailrc minilanguage and parallel command-line options for them were added more or less reflexively.

The intent of supporting all these options was to make fetchmail easier to script by allowing users to override bits of its run control from the command line. But it turns out that outside of a few options like --fetchall and --verbose there is little demand for this — and none that can't be satisfied with a shellscript that creates a temporary run-control file on the fly and then feeds it to fetchmail using the -f option.

Thus, most of the command-line options are never used, and in retrospect including them was probably a mistake; they bulk up the fetchmail code a bit without accomplishing anything very useful.

	If bulking up the code were the only problem, nobody would care, except for a couple of maintainers. However, options increase the chances of error in code, particularly due to unforeseen interactions among rarely used options. Worse, they bulk up the manual, which is a burden on everybody.
-- Doug McIlroy

There is a lesson here; had I thought carefully enough about fetchmail's usage pattern and been a little less ad-hoc about adding features, the extra complexity might have been avoided.

	An alternative way of dealing with such situations, which doesn't clutter up either the code or the manual much, is to have a “set option variable” option, such as the `-O` option of sendmail, which lets you specify an option name and value, and sets that name to that value as if such a setting had been given in a configuration file. A more powerful variant of this is what ssh does with its `-o` option: the argument to `-o` is treated as if it were a line appended to the configuration file, with the full config-file syntax available. Either of these approaches gives people with unusual requirements a way to override configuration from the command line, without requiring you to provide a separate option for each bit of configuration that might be overridden.
-- Henry Spencer

[an error occurred while processing this directive]

The Art of Unix Programming
Prev	Home	Next