Finding, Using and Decoding CGI Data

In this first article in this series we looked at the HTTP transaction that is the key to understanding how data is passed around with the World Wide Web. We took a quick look at the mechanism of CGI: A gateway interface between a web server and an external program that accepts, processes and returns data the server then sends back to the client. We also looked at getting set up to program Perl CGI and the basic ways to send data back to a client.

So, you can send output from a CGI process to the server (and therefore the client), but to go further you need to know how to find and decode client supplied data within your script; you need to be able to figure out what the client is saying.

When it spawns the Perl process the web server puts all information into environmental variables, except in the case of a POST request where the server feeds some of the information (name/value pairs from your HTML form) into the STDIN handle. More on that in a moment, but first lets look at environmental variables.

Getting Environmental Data from the Script

Perl can access environmental data about its own process by looking in a special array called %ENV (think environment). Hash arrays in Perl are keyed by name, rather than number, and are indicated with the % sign. You can find the names of the elements (also called values) of a hash array by using the keys operator, if you don't know them already.

Perl fills the %ENV hash with the values of most of the header lines from the HTTP transaction. The hash also holds any other values your particular server might think are important, like the server's name or the name of your operating system. To get the value of a single element of a hash you evaluate the hash as a scalar and indicate the name of the element you want. For instance, to get the value of the HOST HTTP header you could print it:

print $ENV{'HTTP_HOST'};



or assign it to a variable:
$remote_host = $ENV{'HTTP_HOST'};




Don't assume the names of these values, the server can assign any name it pleases. It's true that there is a great deal of standardization across servers in this - for instance most headers that come from the clients have HTTP_ prepended to their names. It would be a seriously buggy server that altered the names of the most important values for CGI, but it's not something you'd want to leave unchecked if your script is behaving oddly.

Exploring your Environment

One of the first things you'll want to do is output a record of all the values in %ENV for your particular server. The following small CGI script prints every name and value from a given %ENV hash.
#!/usr/bin/perl

print_head("Variables", 200, "OK");

foreach $name (keys %ENV) {
    print "the value of ", $name, " is ", $ENV{$name}, "<br>
";
}

print "</body></html>
";

exit;



The foreach keyword loops through an array and assigns the next value to the scalar indicated, in this case $name. The keys function returns an array consisting of the key values from the hash %ENV.

For ease of reference the following chart shows the values of most of the environmental variables you will see in a typical client to web server transaction. However, you should always run your own tests to be sure what information you have available.

SCRIPT_NAMESERVER_NAMESERVER_SOFTWAREREMOTE_ADDR SERVER_PROTOCOLREQUEST_METHOD HTTP_USER_AGENTQUERY_STRING PATH HTTP_ACCEPTHTTP_CACHE_CONTROLHTTP_X_FORWARDED_FOR SERVER_PORTHTTP_HOST
Environmental Variable Value Meaning Example
The name and path to your script from the web root. /scripts/myscript.cgi
Sometimes set to the domain of your server. www.mydomain.com
The name and version of your server software. MyServer/2.4
The IP number of the client making the request. 207.240.80.129
The version of HTTP the server is using. HTTP/1.0
The method used by the client. GET
The identification of the client software, computability and version. Mozilla/2.0 (compatible; MyBrowser/1.0)
The user supplied name/value pairs from your form. title=midsummer+night%27s+dream&author=shakespeare
The system paths of your script's process (where it searches for required libraries) /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin
The MIME content-types the client can accept. text/html, image/jpeg, image/gif
An HTTP 1.1 header set by the Squid Caching. max-age=259200
Identification of the Squid 152.67.177.122
The port your server is monitoring for HTTP requests. 80
Identification of your server. www.myserver.com

For a complete list of all possible HTTP headers (which you might see in your environmental variables, depending on your server) consult the HTTP documentation.

If you need to convert IP numbers to domain names or vica versa there are a number of tools you can use. nslookup is included in many operating systems (including Linux) to query a domain name server. On Linux simply type:

%nslookup mydomainnameoripnumber

You can also look at the output of the ping tool for this information.

Finding and Decoding User Data

When a client makes a request of a web server it supplies data from the key/value pairs you specify in your HTML form. For instance, when you create a field in a form and name that field "title", and the user fills out that field with the value "midsummer", the key value sent is title=midsummer. If multiple key/value pairs are sent, they are divided by the & character.

If the request method is GET then these key/value pairs are appended to the URL and the server passes them to the CGI script in the QUERY_STRING environmental variable. If the request method is POST the client includes the key/value pairs in the body of its request, and the server passes those values along to the script by feeding them in a data stream to STDIN of the script's process. In this case the server also sets CONTENT_LENGTH to be the length in bytes of that data.

You can find out which method the client requested by looking at the value of the REQUEST_METHOD environmental variable.

In the following Perl code we create a scalar called $form_data and fill it with the encoded query string depending on the request method.

if ($ENV{'REQUEST_METHOD'} eq "GET") {
    $form_data = $ENV{'QUERY_STRING'};
} elsif ($ENV{'REQUEST_METHOD' eq "POST") {
    read(STDIN, $form_data, $ENV{'CONTENT_LENGTH'} or print_head("error", 500, "Read failed $!");
} else {
    print_head("error", 300, "Client is using unsupported method");
}



In this case we have extended our if/else construct with elsif. (Watchout, there is no e in elsif!) Since the environmental variables we are checking here are text strings, we ask if, "is it true that the REQUEST_METHOD equals GET?". If they had been numbers, we would have used == to make the same test.

If the request method is POST we use a Perl read command and the length (in bytes) of the data in order to fill the scalar $form_data with exactly that many bytes from the STDIN file handle. If neither method is found, or if the read fails, we return an error and abort our script.

Next, we'll want to create our own array containing this query string, by splitting it on the & character which divides the name/value pairs:

@PAIRS = split (/&/, $form_data);



The split function returns sub-strings from a longer string by splitting the target string on a pattern you specify in a regular expression. Split discards the character it matched. For instance, if our $form_data scalar had contained the string:
title=midsummer&author=shakespeare&results=10



then our @PAIRS array would contain 3 items: title=midsummer, author=shakespeare and results=10.