The Common Gateway Interface
The Common Gateway Interface (CGI) is a standard for interfacing external (gateway) applications with information servers (primarily HTTP servers). The CGI interface has been in use with the World Wide Web since 1993, and the current version is CGI/1.1. Whereas many of the requests sent to a web server simply retrieve the contents of a file stored on the server, those directed to a gateway program (a CGI script) will cause the program to be executed. The resulting output will vary, depending on the parameters passed to the server-side executable, but will normally trigger a dynamically generated response to be sent to the client application that initiated the request. CGI allows an HTTP server and a CGI script to share responsibility for responding to client requests, and defines a standard way of handling such transactions, including how information is passed to the script, and how the output is used by the server to generate a response.
CGI programs can be written in a variety of different scripting or programming languages, but are most often created as executable scripts using a language such as Perl, and stored in a specific directory on the server named "cgi-bin". The script itself usually consists of a set of program statements, stored in an ASCII text file, that are interpreted at run-time. The client request consists of a Uniform Resource Identifier (URI), a request method, a set of headers that convey information about the client request, and an optional message-body that contains user data.
The client request is received by the HTTP server application, which carries out any necessary decoding and invokes the CGI script identified by the request's URI. The server software converts the client request into a CGI request before passing it to the script via the standard input file handle (stdin). Relevant information about both the request and the HTTP server are passed to the script as a set of named parameters known as meta-variables (these are usually, though not always, operating system environment variables), together with the contents of the message-body. Once the CGI script has executed, the response it generates is forwarded to the client after any necessary encoding has been applied. The server application is responsible for any client authentication required, and for implementing security.
Request methods
The request method is supplied to the script using the REQUEST_METHOD meta-variable, and identifies the processing method to be employed by the script when creating a response. The methods commonly supported include:
- GET - indicates that the script should produce a document based on the meta-variable values received.
- POST - requires the script to perform processing and produce a document based on the data in the request message-body, as well as the meta-variable values received. Typical uses include processing HTML form data, which often initiates processing by the script that makes changes to a database.
- HEAD - requires the script to do enough processing to return the response header fields (but not a response message-body).
The script may also support protocol-specific methods, such as PUT and DELETE (HTTP/1.1). Some systems support a method for supplying an array of strings to the CGI script as arguments. This is only used in the case of an ISINDEX HTTP query, which is identified by a GET or HEAD request accompanied by a URI query string that does not contain any unencoded "=" characters. The query string will be parsed into words, which are then URL-decoded, optionally encoded in a system-defined manner, and added to the command line argument list.
The meta-variables that may be included in a CGI request are described in the table below.
Meta-variable | Description |
---|---|
AUTH_TYPE | Identifies a mechanism used by the server to authenticate the user. |
CONTENT_LENGTH | Contains the size in bytes of the message-body attached to the request, if it exists. |
CONTENT_TYPE | Specifies the Internet Media Type of the message-body, if it exists. |
GATEWAY_INTERFACE | Identifies the version of CGI implemented by the server (e.g. 1.1). |
PATH_INFO | Specifies a path to be interpreted by the CGI script. It identifies the resource or sub-resource to be returned by the CGI script, and is derived from the portion of the URI path hierarchy following the part that identifies the script itself. |
PATH_TRANSLATED | Derived by taking the PATH_INFO value, parsing it as a local URI in its own right, and performing any virtual-to-physical translation appropriate to map it onto the server's directory structure. |
QUERY_STRING | A URL-encoded search or parameter string that provides information to the CGI script about the client request. |
REMOTE_ADDR | The IP address of the client sending the request to the server. |
REMOTE_HOST | The fully qualified domain name of the client sending the request to the server, if available. |
REMOTE_IDENT | May be used to provide identity information reported about the connection by an RFC 1413 request to the remote agent. |
REMOTE_USER | A user identification string supplied by client as part of user authentication. |
REQUEST_METHOD | The method that should be used by the script to process the request (e.g. GET, POST, HEAD etc). |
SCRIPT_NAME | A URI path that could identify the CGI script. |
SERVER_NAME | The name of the server to which the client request is directed (may be either a hostname or IP address). |
SERVER_PORT | The port number to which the request was sent. |
SERVER_PROTOCOL | The name and version of the application protocol used for this CGI request (e.g. HTTP/1.1). |
SERVER_SOFTWARE | The name and version of the information server software making the CGI request. |
HTTP_ACCEPT | The Internet Media Types that the client will accept. |
HTTP_USER_AGENT | The browser the client is using to send the request. |
N.B - meta-variables with names beginning with "HTTP_" contain values read from the client request header fields.
The CGI Response
The CGI response is passed to the server via the standard output file handle (stdout), and will consist of a message-header and a message-body, separated by a blank line. The message-header contains one or more header fields. The body may be empty. The script will return a document response, a local redirect response, or a client redirect (with optional document) response. The response types are described below.
- Document response - the CGI script returns a document to the user, with an optional Status header field indicating the status of the response (status 200 'OK' is assumed if this header field is omitted). The script must return a Content-Type header field. The server will carry out any required modification to the script's output to ensure that it complies with the response protocol used by the server.
- Local redirect response - the script returns a URI path and query-string ('local-pathquery') for a local resource in a Location header field. This tells the server to re-process the request using the specified path information.
- Client redirect response - the script returns an absolute URI path in a Location header field to indicate to the client that it should reprocess the request using the specified URI.
- Client redirect response with document - the script returns an absolute URI path in a Location header field, together with an attached document, to indicate to the client that it should reprocess the request using the specified URI.
The CGI response message-body follows the CGI response headers, and is a document to be returned to the client by the server. The server read all of the data provided by the script until it encounters the end of the message-body (indicated by an end-of-file condition). The message-body should be sent to the client without modification apart from any necessary encoding (unless the request used the HEAD method, in which case it will not be sent).
CGI response header fields
The response header fields are either CGI or extension header fields (which will be interpreted by the server), or protocol-specific header fields (which will be included in the response returned to the client). At least one CGI field must be supplied. The response header fields are described below.
- Content-Type - identifies the Internet Media Type of the entity body.
- Location - used to specify to the server that the script is returning a reference to a document rather than an actual document. It is either an absolute URI, indicating that the client is to fetch the referenced document, or a local URI path (optionally with a query string), indicating that the server is to fetch the referenced document and return it to the client as the response.
- Status - a 3-digit code, followed by a reason-phrase, that indicates the success or failure of the script's attempt to handle the request (if omitted, status 200 'OK' is assumed). Status code 302 'Found' is used with a Location header field and response message-body. Status code 400 'Bad Request' may be used for an unrecognised request format. Status code 501 'Not Implemented' or 405 'Method Not Allowed' may be returned if the script receives an unsupported request method. The reason-phrase is a brief description of the response status intended for human consumption.
The script may also return other header fields relating to the response message that are specific to the server protocol (i.e. HTTP/1.0 or HTTP/1.1).
Processing HTML forms
Forms in web pages allow users to enter data. Once all of the required information has been entered into the form, the user can submit the contents of the form to the server by clicking on the form’s Submit button. The method used to send the data to the gateway program (usually a CGI script) by the server depends on the HTTP method specified for the form, which will be either GET or POST. If the form has METHOD=GET in its FORM tag, the form input is passed to the CGI script in an environment variable called QUERY_STRING. If METHOD=POST is used, form input is passed to the CGI script via stdin (the standard input). The environment variable CONTENT_LENGTH is used to inform the script how much data to read from stdin.
When a form is created in a HTML document, each input box in the form is given a unique name using the NAME attribute (e.g. NAME="lastname"). The information typed into the input box by the user becomes the value associated with that input box. When the user clicks on the Submit button, the form data is sent to the server as a URL-encoded string consisting of name=value pairs separated by the ampersand character (&). The URL-encoding replaces any reserved characters that form part of the user data (e.g. "@", "&", "$") with an escape sequence consisting of the percent character ("%") followed by two hexadecimal digits. The escape sequence represents the character to which it corresponds in the ASCII character set.