CGI Programming on the World Wide Web
By Shishir Gundavaram

Chapter 5: Server Side Includes


In This Chapter:


Introduction

You're starting to get the hang of CGI, but aren't too thrilled with the fact that you have to write full-fledged CGI programs even when you want to output a document with only a minimum amount of dynamic information, right? For example, say you want to display the current date and time, or a certain CGI environment variable in your otherwise static document. You can go through the trouble of writing a CGI program that outputs this small amount of virtual data, or better yet, you can use a powerful feature called Server Side Includes (or SSI).

Server Side Includes are directives which you can place into your HTML documents to execute other programs or output such data as environment variables and file statistics. Unfortunately, not all servers support these directives; the CERN server cannot handle SSI, but the servers from NCSA and Netscape can. However, there is a CGI program called fakessi.pl that you can use to emulate Server Side Includes if your server does not support them.

While Server Side Includes technically are not really CGI, they can become an important tool for incorporating CGI-like information, as well as output from CGI programs, into documents on the Web.

How do Server Side Includes work? When the client requests a document from the SSI-enabled server, the server parses the specified document and returns the evaluated document (see Figure 5-1). The server does not automatically parse all files looking for SSI directives, but only ones that are configured as such. We will look at how to configure documents in the next section.

Figure 5-1:

SSI sounds like a great feature, but it does have its disadvantages. First, it can be quite costly for a server to continually parse documents before sending them to the client. And second, enabling SSI creates a security risk. Novice users could possibly embed directives to execute system commands that output confidential information. Despite these shortcomings, SSI can be a very powerful tool if used cautiously.

Table 5-1 lists all the SSI directives. In this chapter, I'll discuss each of these directives in detail.

Table 5-1 -- SSI Directives

Command   Parameter  Description

echo      var        Inserts value of special SSI variables as well as other environment variables

include    --        Inserts text of document into current file 

file       --        Pathname relative to current directory

virtual    --        Virtual path to a document on the server

fsize      file      Inserts the size of a specified file

flastmod   file      Inserts the last modification date and time for a specified file

exec       --        Executes external programs and inserts output in current document

cmd        --        Any application on the host

cgi        --        CGI program

config     --        Modifies various aspects of SSI

errmsg     --        Default error message

sizefmt    --        Format for size of the file
 
timefmt    --        Format for dates

Configuration

How does the server know which files to parse, and which ones to return without parsing? From the information in the server configuration files, of course. Let's look at how we can configure SSI on the NCSA server.

The first thing you need to set is the extension(s) for the files that the server should parse in the server configuration file (srm.conf). For example, the following line will force the server to parse all files that end in .shtml:

AddType text/x-server-parsed-html .shtml

Internally, the server uses the text/x-server-parsed-html MIME content type to identify parsed documents. An important thing to note here is that you cannot have SSI directives within your CGI program, because the server does not parse the output generated by the program.

Alternatively, you can set the configuration so that the server parses all HTML documents:

AddType text/x-server-parsed-html .html

However, this is not a good idea! It will severely degrade system performance because the server has to parse all the HTML documents that it returns.

Now let's look at the two configuration options that you must set in the access configuration file (access.conf) that dictate what type of SSI directives you can place in your HTML document:

Here is how you would enable both Includes and Exec:
Options Includes ExecCGI
To exclusively enable Includes without Exec, you need to add the following:
Options IncludesNoExec
Before enabling either of these features, you should think about system security and performance.

Configuring SSI for the CERN Server

As we mentioned at the beginning of this chapter, not all servers support SSI. However, you can use a Perl program called fakessi.pl to emulate SSI behavior.

For example, on the CERN server, all you need to do is:

  1. Install fakessi.pl into the cgi-bin directory.
  2. Add the following directive to httpd.conf:
    Exec /*.shtml /usr/local/etc/httpd/cgi-bin/fakessi.pl
    (assuming that /usr/local/etc/httpd/cgi-bin is the directory that fakessi.pl was installed into).
This tells the server to execute fakessi.pl whenever a client requests a file ending in .shtml.

You can get fakessi.pl from http://sw.cse.bris.ac.uk/WebTools/fakessi.html.

Environment Variables

As I mentioned before, you can insert the values of environment variables in an otherwise static HTML document. Here is an example of a document that contains a few SSI directives:

<HTML>

<HEAD><TITLE>Welcome!</TITLE></HEAD>

<BODY>

<H1>Welcome to my server at <!--#echo var="SERVER_NAME"-->...</H1>

<HR>

Dear user from <!--#echo var="REMOTE_HOST"-->,

<P>

There are many links to various CGI documents throughout the Web, so feel free to explore.

.
.
.

<HR>

<ADDRESS>Shishir Gundavaram (<!--#echo var="DATE_LOCAL"-->)</ADDRESS>

</BODY></HTML>

SSI directives have the following format:

<!--#command parameter="argument"-->
In this example, the echo SSI command with the var parameter is used to display the IP name or address of the serving machine, the remote host name, and the local time. Of course, we could have written a CGI program to perform the same function, but this approach is much quicker and easier, as you can see.

All environment variables that are available to CGI programs are also available to SSI directives. There are also a few variables that are exclusively available for use in SSI directives, such as DATE_LOCAL, which contains the current local time. Another is DATE_GMT:

The current GMT time is: <!--#echo var="DATE_GMT"-->

which contains the Greenwich Mean Time.

Here is another example that uses some of these exclusive SSI environment variables to output information about the current document:

<H2>File Summary</H2>

<HR>

The document you are viewing is titled: <!--#echo var="DOCUMENT_NAME"- ->,

and you can access it a later time by opening the URL to:

<!--#echo var="DOCUMENT_URI"-->. Please add this to your bookmark list.

<HR>

Document last modified on <!--#echo var="LAST_MODIFIED"-->.

This will display the name, URL (although the variable is titled DOCUMENT_URI), and modification time for the current HTML document.

For a listing of CGI environment variables, see Table 2-1. Table 5-2 shows additional SSI environment variables.

Table 5-2 -- Additional SSI Environment Variables

Environment Variable     Description

DOCUMENT_NAME            The current file
DOCUMENT_URI             Virtual path to the file
QUERY_STRING_UNESCAPED   Undecoded query string with all shell metacharacters escaped with "\"
DATE_LOCAL               Current date and time in the local time zone
DATE_GMT                 Current date and time in GMT
LAST_MODIFIED            Last modification date and time for current file

Including Boilerplates

There are times when you will have certain information that you repeat in numerous documents on the server, like your signature, or a thank-you note. In cases like this, it's efficient to have that information stored in a file, and insert that file into your various HTML documents with the SSI include command. Suppose you have a signature file like the following stored in address.html:

<HR>

<ADDRESS>

<PRE>

Shishir Gundavaram WWW Software, Inc.

White Street 90 Sherman Street

Boston, Massachusetts 02115 Cambridge, Massachusetts 02140

shishir@bu.edu

The address information was last modified Friday, 22-Dec-95 12:43:00 EST.

</PRE>

</ADDRESS>

You can include the contents of this file in any other HTML document with the following command:

<!--#include file="address.html"-->
This will include address.html located in the current directory into another document. You can also use the virtual parameter with the include command to insert a file from a directory relative to the server root:
<!--#include virtual="/public/address.html"-->
For our final example, let's include a boilerplate file that contains embedded SSI directives. Here is the address file (address.shtml) with an embedded echo command (note the .shtml extension):

<HR>

<ADDRESS>

<PRE>

Shishir Gundavaram WWW Software, Inc.

White Street 90 Sherman Street

Boston, Massachusetts 02115 Cambridge, Massachusetts 02140

shishir@bu.edu

The address information was last modified on <!--#echo var="LAST_ MODIFIED"-->.

</PRE>

</ADDRESS>

When you include this address file into an HTML document, it will contain your signature along with the date the file was last modified.

File Statistics

There are SSI directives that allow you to retrieve certain information about files located on your server. For example, say you have a hypertext link in one of your documents that points to a manual describing your software that users can download. In such a case, you should include the size and modification date of that manual so users can decide whether it's worth their effort to download a document; it could be outdated or just too large for them to download. Here's an example:

Here is the latest reference guide on CGI. You can download it by clicking

<A HREF="/cgi-refguide.ps">here</A>. The size of the file is

<!--#fsize file="/cgi-refguide.ps"--> bytes and was last modified

on <!--#flastmod file="/cgi-refguide.ps"-->.

The fsize command, along with its lone parameter, file, displays the size of the specified file (relative to the document root) in bytes. You can use the flastmod command to insert the modification date for a certain file. The difference between the SSI variable LAST_MODIFIED and this command is that flastmod allows you to choose any file, while LAST_MODIFIED displays the information for the current file. You have the option of tailoring the output from these commands with the config command. We will look at this later in the chapter.

Executing External Programs

Wouldn't it be great if we could execute either a CGI or a system program and place its output in our HTML document? With the SSI command exec, we can do just that using the exec cmd directive:

Welcome <!--#echo var="REMOTE_USER"-->. Here is some information about you:

<PRE>

<!--#exec cmd="/usr/ucb/finger $REMOTE_USER@$REMOTE_HOST"-->

</PRE>

In this example, we use the UNIX finger command to retrieve some information about the user. SSI allows us to pass command-line arguments to the external programs. If you plan to use environment variables as part of an argument, you have to precede them with a dollar sign. The reason for this is that the server spawns a shell to execute the command, and that's how you would access the environment variables if you were programming in a shell. Here is what the output will look like, assuming REMOTE_USER and REMOTE_HOST are "shishir" and "bu.edu", respectively:

Welcome shishir. Here is some information about you:

<PRE>

[bu.edu]

Trying 128.197.154.10...

Login name: shishir In real life: Shishir Gundavaram

Directory: /usr3/shishir Shell: /usr/local/bin/tcsh

Last login Thu Jun 23 08:18 on ttyq1 from nmrc.bu.edu:0.

New mail received Fri Dec 22 01:51:00 1995;

unread since Thu Dec 21 17:38:02 1995

Plan:

Common, aren't you done with the book yet?

</PRE>

You should enclose the output from an external command in a <PRE>..</PRE> block, so that whitespace is preserved. Also, if there is any HTML code within the data output by the external program, the browser will interpret it!

(To use the exec directive, remember that you need to enable Exec in the Options line of the access.conf file, as described in the "Configuration" section earlier in this chapter.)

Having the ability to execute external programs makes things easier, but it also poses a major security risk. Say you have a "guestbook" (a CGI application that allows visitors to leave messages for everyone to see) on a server that has SSI enabled. Most such guestbooks around the Net actually allow visitors to enter HTML code as part of their comments. Now, what happens if a malicious visitor decides to do some damage by entering the following:

<--#exec cmd="/bin/rm -fr /"-->
If the guestbook CGI program was designed carefully, to strip SSI commands from the input, then there is no problem. But, if it was not, there exists the potential for a major headache!

Executing CGI Programs

You can use Server Side Includes to embed the results of an entire CGI program into a static HTML document, using the exec cgi directive. Why would you want to do this? There are many times when you want to display just one piece of dynamic data, such as: This page has been accessed 4883 times since December 10, 1995. Surely, you've seen this type of information in many documents around the Web. Obviously, this information is being generated dynamically (since it changes every time you access the document). We'll show you a few examples of embedded CGI programs using SSI.

User Access Counter

Suppose you have a simple CGI program that keeps track of the number of visitors, called by the exec SSI command in an HTML document:
This page has been accessed <!--#exec cgi="/cgi-bin/counter.pl"--> times.
The idea behind an access counter is simple. A data file on the server contains a count of the number of visitors that have accessed a particular document. Whenever a user visits the document, the SSI command in that document calls a CGI program that reads the numerical value stored in the file, increments it, and writes the new information back to the file and outputs it. Let's look at the program:
#!/usr/local/bin/perl

print "Content-type: text/plain", "\n\n";

$count_file = "/usr/local/bin/httpd_1.4.2/count.txt";


if (open (FILE, "<" . $count_file)) {

		$no_accesses = <FILE>;

		close (FILE);



		if (open (FILE, ">" . $count_file)) {

			$no_accesses++;



			print FILE $no_accesses;

			close (FILE);



			print $no_accesses;

		} else {

			print "[ Can't write to the data file! Counter not 
incremented! ]", "\n";

		}



} else {

		print "[ Sorry! Can't read from the counter data file ]", "\n";

}



exit (0);
Since we are opening the data file from this program, we need the full path to the file. We can then proceed to try to read from the file. If the file cannot be opened, an error message is returned. Otherwise, we read one line from the file using the <FILE> notation, and store it in the variable $no_accesses. Then, the file is closed. This is very important because you cannot write to the file that was opened for reading.

Once that's done, the file is opened again, but this time in write mode, which creates a new file with no data. If that's not successful, probably due to permission problems, an error message stating that information cannot be written to the file is output. If there are no problems, we increment the value stored in $no_ accesses. This new value is written to the file and printed to standard output.

Notice how this program, like other CGI programs we've covered up to this point, also outputs a Content-type HTTP header. In this case, a text/plain MIME content type is output by the program.

An important thing to note is that a CGI program called by an SSI directive cannot output anything other than text because this data is embedded within an HTML or plain document that invoked the directive. As a result, it doesn't matter whether you output a content type of text/plain or text/html, as the browser will interpret the data within the scope of the calling document. Needless to say, your CGI program cannot output graphic images or other binary data.

This CGI program is not as sophisticated as it should be. First, if the file does not exist, you will get an error if you open it in read mode. So, you must put some initial value in the file manually, and set permissions on the file so that the CGI program can write to it:

% echo "0" > /usr/local/bin/httpd_1.4.2/count.txt % chmod 777 /usr/local/bin/httpd_1.4.2/count.txt
These shell commands write an initial value of "0" to the count.txt file, and set the permissions so that all processes can read, write, and execute the file. Remember, the HTTP server is usually run by a process with minimal privileges (e.g., "nobody" or "www"), so the permissions on the data file have to be set so that this process can read and write to it.

The other major problem with this CGI program is that it does not lock and unlock the counter data file. This is extremely important when you are dealing with concurrent users accessing your document at the same time. A good CGI program must try to lock a data file when in use, and unlock it after it is done with processing. A more advanced CGI program that outputs a graphic counter is presented in Chapter 6, Hypermedia Documents.

Random Links

You can use the following CGI program to create a "random" hypertext link. In other words, the link points to a different WWW site every time you reload.

Why do you want to do this? Well, for kicks. Also, if the sites are actually mirrors of each other, so it doesn't matter which one you refer people to. By changing the link each time, you're helping to spread out the traffic generated from your site.

Place the following line in your HTML document:

<!--#exec cgi="/cgi-bin/random.pl"-->
Here's the program:
#!/usr/local/bin/perl


@URL = ("http://www.ora.com",

        "http://www.digital.com",

        "http://www.ibm.com",

        "http://www.radius.com");



srand (time | $$);
The @URL array (or table) contains a list of the sites that the program will choose from. The srand function sets a seed based on the current time and the process identification for the random number generator. This ensures a truly random distribution.
$number_of_URL = $#URL;

$random = int (rand ($number_of_URL));
The $number_of_URL contains the index (or position) of the last URL in the array. In Perl, arrays are zero-based, meaning that the first element has an index of zero. We then use the rand function to get a random number from 0 to the index number of the last URL in the array. In this case, the variable $random will contain a random integer from 0 to 3.
$random_URL = $URL[$random];



print "Content-type: text/html", "\n\n";

print qq|<A HREF="$random_URL">Click here for a random Web site!</A>|, 
"\n";



exit (0);
A random URL is retrieved from the array and displayed as a hypertext link. Users can simply click on the link to travel to a random location.

Before we finish, let's look at one final example: a CGI program that calculates the number of days until a certain event.

Counting Days Until . . .

Remember we talked about query strings as a way of passing information to a CGI program in Chapter 2? Unfortunately, you cannot pass query information as part of an SSI exec cgi directive. For example, you cannot do the following:
<!--#exec cgi="/cgi-bin/count_days.pl?4/1/96"-->
The server will return an error.

However, we can create a regular Perl program (not a CGI program) that takes a date as an argument, and calculates the number of days until/since that date:

<!--#exec cmd="/usr/local/bin/httpd_1.4.2/count_days.pl 4/1/96"-->
In the Perl script, we can access this command-line data (i.e., "4/1/96") through the @ARGV array. Now, the script:
#!/usr/local/bin/perl



require "timelocal.pl";

require "bigint.pl";

The require command makes the functions within these two default Perl libraries available to our program.
($chosen_date = $ARGV[0]) =~ s/\s*//g;

The variable $chosen_date contains the date passed to this program, minus any whitespace that may have been inserted accidently.
if ($chosen_date =~ m|^(\d+)/(\d+)/(\d+)$|) {

    ($month, $day, $year) = ($1, $2, $3);
This is another example of a regular expression, or regexp. We use the regexp to make sure that the date passed to the program is in a valid format (i.e., mm/dd/yyyy). If it is valid, then $month, $day, and $year will contain the separated month, day, and year from the initial date.
    $month -= 1;



    if ($year > 1900) {

        $year -= 1900; 

    }



    $chosen_secs = &timelocal (undef, undef, undef, $day, $month, 
$year);
We will use the timelocal subroutine (notice the & in front) to convert the specified date to the number of seconds since 1970. This subroutine expects month numbers to be in the range of 0-11 and years to be from 00-99. This conversion makes it easy for us to subtract dates. An important thing to remember is that this program will not calculate dates correctly if you pass in a date before 1970.
    $seconds_in_day = 60 * 60 * 24;   

    $difference = &bsub ($chosen_secs, time);

    $no_days = &bdiv ($difference, $seconds_in_day);

    $no_days =~ s/^(\+|-)//;
The bsub subroutine subtracts the current time (in seconds since 1970) from the specified time. We used this subroutine because we are dealing with very large numbers, and a regular subtraction will give incorrect results. Then, we call the bdiv subroutine to calculate the number of days until/since the specified date by dividing the previously calculated difference with the number of seconds in a day. The bdiv subroutine prefixes the values with either a "+" or a "-" to indicate positive or negative values, respectively, so we remove the extra character.
    print $no_days;

    exit(0);

Once we're done with the calculations, we output the calculated value and exit.
} else {

    print " [Error in date format] ";

    exit(1);

}
If the date is not in a valid format, an error message is returned.

Tailoring SSI Output

The config SSI command allows you to select the way error messages, file size information, and date and time are displayed. For example, if you use the include command to insert a non-existing file, the server will output a default error message like the following:
[an error occurred while processing this directive]
By using the config command, you can modify the default error message. If you want to set the message to "Error, contact shishir@bu.edu" you can use the following:
<!--#config errmsg="Error, contact shishir@bu.edu"-->
You can also set the file size format that the server uses when displaying information with the fsize command. For example, this command:
<!--#config sizefmt="abbrev"-->
will force the server to display the file size rounded to the nearest kilobyte (K). You can use the argument "bytes" to set the display as a byte count:
<!--#config sizefmt="bytes"-->
Here is how you can change the time format:
<!--#config timefmt="%D %r"-->
The file address.html was last modified on: <!--#flastmod file="address.html"-->.

The output will look like this:

The file address.html was last modified on: 12/23/95 07:17:39 PM
The %D format specifies that the date should be in dd/mm/yy format, while the %r format specifies "hh/mm/yy AM|PM" format. Table 5-3 lists all the data and time formats you can use.

Table 5-3 -- SSI Time Formats

Format Value Example %a Day of the week abbreviation Sun %A Day of the week Sunday %b Month name abbreviation (see %h) Jan %B Month name January %d Date 1 (and not 01) %D Date as "%m/%d/%y" 06/23/95 %e Date 01 %H 24-hour clock hour 13 %I 12-hour clock hour 1 %j Decimal day of the year 360 %m Month number 11 %M Minutes 08 %p am | pm a.m. %r Time as "%I:%M:%S AM | PM" 07:17:39 PM %S Seconds 09 %T 24-hour time as "%H:%M:%S" 16:55:15 %U Week of the year (also %W) 49 %w Day of the week number 05 %y Year of the century 95 %Y Year 1995 %Z Time zone EST

Common Errors

There are two common errors that you can make when using Server Side Includes. First, you should not forget the "#" sign:
<!--echo var="REMOTE_USER"-->
Second, do not add extra spaces between the "-" sign and the "#" character:
<!-- #echo var="REMOTE_USER"-->
If you make either of these two mistakes, the server will not give you an error; rather it will treat the whole expression as an HTML comment.

Copyright 1996, O'Reilly & Associates. All rights reserved.

CGI Programming on the Web


O'Reilly Home | Catalog & Orders | Customer Service | About O'Reilly
Contact Us | Site Index | Product Index | Search the Catalog |