Finding, Using and Decoding CGI Data
In this first article in this series we looked at the HTTP transaction that is the key to understanding how data is passed around with the World Wide Web. We took a quick look at the mechanism of CGI: A gateway interface between a web server and an external program that accepts, processes and returns data the server then sends back to the client. We also looked at getting set up to program Perl CGI and the basic ways to send data back to a client.
So, you can send output from a CGI process to the server (and therefore the client), but to go further you need to know how to find and decode client supplied data within your script; you need to be able to figure out what the client is saying.
When it spawns the Perl process the web server puts all information into environmental variables, except in the case of a POST request where the server feeds some of the information (name/value pairs from your HTML form) into the STDIN handle. More on that in a moment, but first lets look at environmental variables.
Getting Environmental Data from the Script
Perl can access environmental data about its own process by looking in a
special array called %ENV (think environment). Hash arrays in
Perl are keyed by name, rather than number, and are indicated with the % sign.
You can find the names of the elements (also called values) of a hash array by
using the keys operator, if you don't know them already.
Perl fills the %ENV hash with the values of most of the header
lines from the HTTP transaction. The hash also holds any other values your
particular server might think are important, like the server's name or the
name of your operating system. To get the value of a single element of a hash
you evaluate the hash as a scalar and indicate the name of the element you
want. For instance, to get the value of the HOST HTTP header you could print
it:
print $ENV{'HTTP_HOST'};
or assign it to a variable:
$remote_host = $ENV{'HTTP_HOST'};
Don't assume the names of these values, the server can assign any name it
pleases. It's true that there is a great deal of standardization across
servers in this - for instance most headers that come from the clients have HTTP_
prepended to their names. It would be a seriously buggy server that altered
the names of the most important values for CGI, but it's not something you'd
want to leave unchecked if your script is behaving oddly.
Exploring your Environment
One of the first things you'll want to do is output a record of all the values in%ENV for your particular server. The following small CGI
script prints every name and value from a given %ENV hash.
#!/usr/bin/perl
print_head("Variables", 200, "OK");
foreach $name (keys %ENV) {
print "the value of ", $name, " is ", $ENV{$name}, "<br>
";
}
print "</body></html>
";
exit;
The foreach keyword loops through an array and assigns the next
value to the scalar indicated, in this case $name. The keys
function returns an array consisting of the key values from the hash %ENV.
For ease of reference the following chart shows the values of most of the environmental variables you will see in a typical client to web server transaction. However, you should always run your own tests to be sure what information you have available.
SCRIPT_NAMESERVER_NAMESERVER_SOFTWAREREMOTE_ADDR SERVER_PROTOCOLREQUEST_METHOD HTTP_USER_AGENTQUERY_STRING PATH HTTP_ACCEPTHTTP_CACHE_CONTROLHTTP_X_FORWARDED_FOR SERVER_PORTHTTP_HOST| Environmental Variable | Value Meaning | Example |
| The name and path to your script from the web root. | /scripts/myscript.cgi | |
| Sometimes set to the domain of your server. | www.mydomain.com | |
| The name and version of your server software. | MyServer/2.4 | |
| The IP number of the client making the request. | 207.240.80.129 | |
| The version of HTTP the server is using. | HTTP/1.0 | |
| The method used by the client. | GET | |
| The identification of the client software, computability and version. | Mozilla/2.0 (compatible; MyBrowser/1.0) | |
| The user supplied name/value pairs from your form. | title=midsummer+night%27s+dream&author=shakespeare | |
| The system paths of your script's process (where it searches for required libraries) | /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin | |
| The MIME content-types the client can accept. | text/html, image/jpeg, image/gif | |
| An HTTP 1.1 header set by the Squid Caching. | max-age=259200 | |
| Identification of the Squid | 152.67.177.122 | |
| The port your server is monitoring for HTTP requests. | 80 | |
| Identification of your server. | www.myserver.com |
For a complete list of all possible HTTP headers (which you might see in your environmental variables, depending on your server) consult the HTTP documentation.
If you need to convert IP numbers to domain names or vica versa there are a number of tools you can use. nslookup is included in many operating systems (including Linux) to query a domain name server. On Linux simply type:
%nslookup mydomainnameoripnumberYou can also look at the output of the ping tool for this information.
Finding and Decoding User Data
When a client makes a request of a web server it supplies data from the key/value pairs you specify in your HTML form. For instance, when you create a field in a form and name that field "title", and the user fills out that field with the value "midsummer", the key value sent istitle=midsummer.
If multiple key/value pairs are sent, they are divided by the &
character.
If the request method is GET then these key/value pairs are appended to the
URL and the server passes them to the CGI script in the QUERY_STRING
environmental variable. If the request method is POST the client includes the
key/value pairs in the body of its request, and the server passes those values
along to the script by feeding them in a data stream to STDIN of the script's
process. In this case the server also sets CONTENT_LENGTH to be
the length in bytes of that data.
You can find out which method the client requested by looking at the value of the REQUEST_METHOD environmental variable.
In the following Perl code we create a scalar called $form_data
and fill it with the encoded query string depending on the request method.
if ($ENV{'REQUEST_METHOD'} eq "GET") {
$form_data = $ENV{'QUERY_STRING'};
} elsif ($ENV{'REQUEST_METHOD' eq "POST") {
read(STDIN, $form_data, $ENV{'CONTENT_LENGTH'} or print_head("error", 500, "Read failed $!");
} else {
print_head("error", 300, "Client is using unsupported method");
}
In this case we have extended our if/else construct with elsif. (Watchout,
there is no e in elsif!) Since the environmental
variables we are checking here are text strings, we ask if,
"is it true that the REQUEST_METHOD equals GET?". If
they had been numbers, we would have used == to make the same
test.
If the request method is POST we use a Perl read command and
the length (in bytes) of the data in order to fill the scalar $form_data
with exactly that many bytes from the STDIN file handle. If neither method is
found, or if the read fails, we return an error and abort our script.
Next, we'll want to create our own array containing this query string, by
splitting it on the & character which divides the name/value
pairs:
@PAIRS = split (/&/, $form_data);
The split function returns sub-strings from a longer string by splitting the
target string on a pattern you specify in a regular expression. Split discards
the character it matched. For instance, if our $form_data scalar
had contained the string:
title=midsummer&author=shakespeare&results=10
then our @PAIRS array would contain 3 items: title=midsummer,
author=shakespeare and results=10.









