Unix Programming - Application Protocol Metaformats - HTTP as a Universal Application Protocol

Follow Techotopia on Twitter

On-line Guides

Linux for Beginners

Office Productivity

Linux Installation

Linux Utilities

Linux Virtualization

System/Network Admin

Scripting Languages

Development Tools

Web Development

GUI Toolkits/Desktop

Eclipse Documentation

Virtuatopia.com

Answertopia.com

How To Guides

General System Admin

Linux Filesystems

Graphics & Desktop

Problem Solutions

The Art of Unix Programming
Prev	Home	Next

Unix Programming - Application Protocol Metaformats - HTTP as a Universal Application Protocol

HTTP as a Universal Application Protocol

Ever since the World Wide Web reached critical mass around 1993, application protocol designers have shown an increasing tendency to layer their special-purpose protocols on top of HTTP, using web servers as generic service platforms.

This is a viable option because, at the transaction layer, HTTP is very simple and general. An HTTP request is a message in an RFC-822/MIME-like format ; typically, the headers contain identification and authentication information, and the first line is a method call on some resource specified by a Universal Resource Indicator (URI). The most important methods are GET (fetch the resource), PUT (modify the resource) and POST (ship data to a form or back-end process). The most important form of URI is a URL or Uniform Resource Locator, which identifies the resource by service type, host name, and a location on the host. An HTTP response is simply an RFC-822/MIME message and can contain arbitrary content to be interpreted by the client.

Web servers handle the transport and request-multiplexing layers of HTTP, as well as standard service types like http and ftp. It is relatively easy to write web server plugins that will handle custom service types, and to dispatch on other elements of the URI format.

Besides avoiding a lot of lower-level details, this method means the application protocol will tunnel through the standard HTTP service port and not need a TCP/IP service port of its own. This can be a distinct advantage; most firewalls leave port 80 open, but trying to punch another hole through can be fraught with both technical and political difficulties.

With this advantage comes a risk. It means that your web server and its plugins grow more complex, and cracks in any of that code can have large security implications. It may become more difficult to isolate and shut down problem services. The usual tradeoffs between security and convenience apply.

RFC 3205, On the Use of HTTP As a Substrate,^[56] has good design advice for anyone considering using HTTP as the underlayer of an application protocol, including a summary of the tradeoffs and problems involved.

Case Study: The `CDDB/freedb.org` Database

Audio CDs consist of a sequence of music tracks in a digital format called CDDA-WAV. They were designed to be played by very simple consumer-electronics devices a few years before general-purpose computers developed enough raw speed and sound capability to decode them on the fly. Because of this, there is no provision in the format for even simple metainformation such as the album and track titles. But modern computer-hosted CD players want this information so the user can assemble and edit play lists.

Enter the Internet. There are (at least two) repositories that provide a mapping between a hash code computed from the track-length table on a CD and artist/album-title/track-title records. The original was cddb.org, but another site called freedb.org which is probably now more complete and widely used. Both sites rely on their users for the enormous task of keeping the database current as new CDs come out; freedb.org arose from a developer revolt after CDDB elected to take all that user-contributed information proprietary .

Queries to these services could have been implemented as a custom application protocol on top of TCP/IP, but that would have required steps such as getting a new TCP/IP port number assigned and fighting to get a hole for it punched through thousands of firewalls. Instead, the service is implemented over HTTP as a simple CGI query (as if the CD's hash code had been supplied by a user filling in a Web form).

This choice makes all the existing infrastructure of HTTP and Web-access libraries in various programming languages available to support programs for querying and updating this database. As a result, adding such support to a software CD player is nearly trivial, and effectively every software CD player knows how to use them.

Case Study: Internet Printing Protocol

Internet Printing Protocol (IPP) is a successful, widely implemented standard for the control of network-accessible printers. Pointers to RFCs, implementations, and much other related material are available at the IETF's Printer Working Group site.

IPP uses HTTP 1.1 as a transport layer. All IPP requests are passed via an HTTP POST method call; responses are ordinary HTTP responses. (Section 4.2 of RFC2568, Rationale for the Structure of the Model and Protocol for the Internet Printing Protocol, does an excellent job of explaining this choice; it repays study by anyone considering writing a new application protocol.)

From the software side, HTTP 1.1 is widely deployed. It already solves many of the transport-level problems that would otherwise distract protocol developers and implementers from concentrating on the domain semantics of printing. It is cleanly extensible, so there is room for IPP to grow. The CGI programming model for handling the POST requests is well understood and development tools are widely available.

Most network-aware printers already embed a web server, because that's the natural way to make the status of the printer remotely queryable by human beings. Thus, the incremental cost of adding IPP service to the printer firmware is not large. (This is an argument that could be applied to a remarkably wide range of other network-aware hardware, including vending machines and coffee makers^[57] and hot tubs!)

About the only serious drawback of layering IPP over HTTP is that the protocol is completely driven by client requests. Thus there is no space in the model for printers to ship asynchronous alert messages back to clients. (However, smarter clients could run a trivial HTTP server to receive such alerts formatted as HTTP requests from the printer.)

[an error occurred while processing this directive]

The Art of Unix Programming
Prev	Home	Next

Published under free license.

Design by Interspire