BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


Apache: The Definitive Guide
Apache: The Definitive Guide, Second Edition Vital Information for Apache Programmers and Administrators By Ben Laurie, Peter Laurie
February 1999
Pages: 388

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Getting Started
When you connect to the URL of someone's home page—say the notional http://www.butterthlies.com/ we shall meet later on—you send a message across the Internet to the machine at that address. That machine, you hope, is up and running, its Internet connection is working, and it is ready to receive and act on your message.
URL stands for Universal Resource Locator. A URL such as http://www.butter-thlies.com/ comes in three parts:
         <method>://<host>/<absolute path URL (apURL)>
      
So, in our example, < method> is http, meaning that the browser should use HTTP (Hypertext Transfer Protocol); <host> is www.butterthlies.com; and <apURL> is "/ ", meaning the top directory of the host. Using HTTP/1.1, your browser might send the following request:
GET / HTTP/1.1
Host: www.butterthlies.com
The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in three parts: a method (an HTTP method, not a URL method), that in this case is GET, but could equally be PUT, POST, DELETE, or CONNECT; the Uniform Resource Identifier (URI) "/"; and the version of the protocol we are using. It is then up to the web server running on that host to make something of this message.
It is worth saying here—and we will say it again—that the whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming.
The host machine may be a whole cluster of hypercomputers costing an oil sheik's ransom, or a humble PC. In either case, it had better be running a web server, a program that listens to the network and accepts and acts on this sort of message.
What do we want a web server to do? It should:
  • Run fast, so it can cope with a lot of inquiries using a minimum of hardware.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Does Apache Work?
Apache is a program that runs under a suitable multitasking operating system. In the examples in this book, the operating systems are Unix and Windows 95/98/NT, which we call Win32. The binary is called httpd under Unix and apache.exe under Win32 and normally runs in the background. Each copy of httpd/apache that is started has its attention directed at a web site , which is, for practical purposes, a directory. For an example, look at site.toddle on the demonstration CD-ROM. Regardless of operating system, a site directory typically contains four subdirectories:
conf
Contains the configuration file(s), of which httpd.conf is the most important. It is referred to throughout this book as the Config file.
htdocs
Contains the HTML scripts to be served up to the site's clients. This directory and those below it, the web space, are accessible to anyone on the Web and therefore pose a severe security risk if used for anything other than public data.
logs
Contains the log data, both of accesses and errors.
cgi-bin
Contains the CGI scripts. These are programs or shell scripts written by or for the webmaster that can be executed by Apache on behalf of its clients. It is most important, for security reasons, that this directory not be in the web space.
In its idling state, Apache does nothing but listen to the IP addresses and TCP port or ports specified in its Config file. When a request appears on a valid port, Apache receives the HTTP request and analyzes the headers. It then applies the rules it finds in the Config file and takes the appropriate action.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What to Know About TCP/IP
To understand the substance of this book, you need a modest knowledge of what TCP/IP is and what it does. You'll find more than enough information in Craig Hunt and Robert Bruce Thompson's books on TCP/IP, but what follows is, we think, what is necessary to know for our book's purposes.
TCP/IP (Transmission Control Protocol/Internet Protocol) is a set of protocols enabling computers to talk to each other over networks. The two protocols that give the suite its name are among the most important, but there are many others, and we shall meet some of them later. These protocols are embodied in programs on your computer written by someone or other; it doesn't much matter who. TCP/IP seems unusual among computer standards in that the programs that implement it actually work, and their authors have not tried too much to improve on the original conceptions.
TCP/IP only applies where there is a network. Each computer on a network that wants to use TCP/IP has an IP address , for example, 192.168.123.1.
There are four parts in the address, separated by periods. Each part corresponds to a byte, so the whole address is four bytes long. You will, in consequence, seldom see any of the parts outside the range -255.
Although not required by protocol, by convention there is a dividing line somewhere inside this number: to the left is the network number and to the right, the host number. Two machines on the same physical network—usually a local area network (LAN)—normally have the same network number and communicate directly using TCP/IP.
How do we know where the dividing line is between network number and host number? The default dividing line is determined by the first of the four numbers: if the value of the first number is:
  • 0-127 (first byte is 0xxxxxxx binary), the dividing line is after the first number, and it is a Class A network. There are few class A networks—125 usable ones—but each one supports up to 16,777,214 hosts.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Does Apache Use TCP/IP?
Let's look at a server from the outside. We have a box in which there is a computer, software, and a connection to the outside world—a piece of Ethernet or a serial line to a modem, for example. This connection is known as an interface and is known to the world by its IP address. If the box had two interfaces, they would each have an IP address, and these addresses would normally be different. One interface, on the other hand, may have more than one IP address (see Chapter 3).
Requests arrive on an interface for a number of different services offered by the server using different protocols:
  • Network News Transfer Protocol (NNTP): news
  • Simple Mail Transfer Protocol (SMTP): mail
  • Domain Name Service (DNS)
  • HTTP: World Wide Web
The server can decide how to handle these different requests because the four-byte IP address that leads the request to its interface is followed by a two-byte port number. Different services attach to different ports:
  • NNTP: port number 119
  • SMTP: port number 25
  • DNS: port number 53
  • HTTP: port number 80
As the local administrator or webmaster, you can (if you really want) decide to attach any service to any port. Of course, if you decide to step outside convention, you need to make sure that your clients share your thinking. Our concern here is just with WWW and Apache. Apache, by default, listens to port number 80 because it deals in WWW business.
   Port numbers below 1024 can only be used by the superuser (
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What the Client Does
Once the server is set up, we can get down to business. The client has the easy end: it wants web action on a particular URL such as http://www.apache.org/. What happens?
The browser observes that the URL starts with http: and deduces that it should be using the HTTP protocol. The "//" says that the URL is absolute, that is, not relative to some other URL. The next part must be the name of the server, www.apache.org. The client then contacts a name server, which uses DNS to resolve this name to an IP address. At the time of writing, this address was 204.152.144.38. One way to check the validity of a hostname is to go to the operating-system prompt and type:
            > ping -c 5 www.apache.org
         
or:
% ping -c 5 www.apache.org
         
If that host is connected to the Internet, a response is returned:
PING www.apache.org (204.152.144.38): 56 data bytes
64 bytes from taz.apache.org (204.152.144.38): icmp_seq=0 ttl=247 time=1380 ms
64 bytes from taz.apache.org (204.152.144.38): icmp_seq=1 ttl=247 time=1930 ms
64 bytes from taz.apache.org (204.152.144.38): icmp_seq=2 ttl=247 time=1380 ms
64 bytes from taz.apache.org (204.152.144.38): icmp_seq=3 ttl=247 time=1230 ms
64 bytes from taz.apache.org (204.152.144.38): icmp_seq=4 ttl=247 time=1360 ms
--- www.apache.org ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/
    max = 1230/1456/1930 ms
The web address http://www.apache.org doesn't include a port because it is port 80, the default, and the browser takes it for granted. If some other port is wanted, it is included in the URL after a colon—for example, http://www.apache.org:8000/. The URL always includes a path, even if is only "/". If the path is left out by the careless user, most browsers put it back in. If the path were /some/where/foo.html on port 8000, the URL would be http://www.apache.org:8000/some/where/foo.html.
The client now makes a TCP connection to port number 8000 on IP 204.152.144.38, and sends the following message down the connection (if it is using HTTP/1.0):
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Happens at the Server End?
We assume that the server is well set up and running Apache. What does Apache do? In the simplest terms, it gets a URL from the Internet, turns it into a filename, and sends the file (or its output) back down the Internet. That's all it does, and that's all this book is about!
Three main cases arise:
  •    The Unix server has a standalone Apache that listens to one or more ports (port 80 by default) on one or more IP addresses mapped onto the interfaces of its machine. In this mode (known as standalone mode ), Apache actually runs several copies of itself to handle multiple connections simultaneously.
  •    The server is configured to use the Unix utility inetd , which listens on all ports it is configured to handle. When a connection comes in, it determines from its configuration file, /etc/inetd.conf, which service that port corresponds to and runs the configured program, which can be an Apache in inetd mode. It is worth noting that some of the more advanced features of Apache are not supported in this mode, so it should only be used in very simple cases. Support for this mode may well be removed in future releases of Apache.
  •    On Windows, there is a single process with multiple threads. Each thread services a single connection. This currently limits Apache to 64 simultaneous connections, because there's a system limit of 64 objects for which you can wait at once. This is something of a disadvantage because a busy site can have several hundred simultaneous connections. It will probably be improved in Apache 2.0.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Which Unix?
We experimented with SCO Unix and QNX, which both support Apache, before settling on FreeBSD as the best environment for this exercise. The whole of FreeBSD is available—free—from http://www.freebsd.org, but sending $69.95 (plus shipping) to Walnut Creek (at http://www.cdrom.com) gets you four CD-ROMs with more software on them than you can shake a stick at, including all the source code, plus a 1750-page manual that should just about get you going. Without Walnut Creek's manual, we think FreeBSD would cost a lot more than $69.95 in spiritual self-improvement.
If you use FreeBSD, you will find (we hope) that it installs from the CD-ROM easily enough, but that it initially lacks several things you will need later. Among these are Perl, Emacs, and some better shell than sh (we like bash and ksh), so it might be sensible to install them straightaway from their lurking places on the CD-ROM.
Linux supports Apache, and most of the standard distributions include it. However, the default position of the Config files may vary from platform to platform, though usually on Linux they are to be found in /etc.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Which Apache?
Apache 1.3 was released, although in rather a partial form, in July 1998. The Unix version was in good shape; the Win32 version of 1.3 was regarded by the Apache Group as essentially beta software.
The main problem with the Win32 version of Apache lies in its security, which must depend, in turn, on the security of the underlying operating system. Unfortunately, Win95 and its successors have no effective security worth mentioning. Windows NT has a large number of security features, but they are poorly documented, hard to understand, and have not been subjected to the decades of discussion, testing, and hacking that have forged Unix security into a fortress that can pretty well be relied upon.
In the view of the Apache development group, the Win32 version is useful for easy testing of a proposed web site. But if money is involved, you would be foolish not to transfer the site to Unix before exposure to the public and the Bad Guys.
We suggest that if you are working under Unix you go for Version 1.3.1 or later; if under Win32, go for the latest beta release and expect to ride some bumps.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Making Apache Under Unix
Download the most recent Apache source code from a suitable mirror site: a list can be found at http://www.apache.org/. You can also load an older version from the enclosed CD-ROM. You will get a compressed file, with the extension .gz if it has been gzipped, or .Z if it has been compressed. Most Unix software available on the Web (including the Apache source code) is compressed using gzip, a GNU compression tool. If you don't have a copy, you will find one on our CD, or you can get it from the Web.
When expanded, the Apache .tar file creates a tree of subdirectories. Each new release does the same, so you need to create a directory on your FreeBSD machine where all this can live sensibly. We put all our source directories in /usr/local/etc/apache. Go there, copy the <apachename>.tar.gz or <apachename>.tar.Z file, and uncompress the .Z version or gunzip (or gzip -d ) the .gz version:
            uncompress <
            apachename
            >.tar.Z
         
or:
            gzip -d <
            apachename
            >.tar.gz
         
Make sure that the resulting file is called <apachename>.tar, or tar may turn up its nose. If not, type:
            mv 
            <
            apachename
            > 
            <
            apachename
            >.tar
         
Now unpack it:
            % tar xvf <
            apachename
            >.tar
         
The file will make itself a subdirectory, such as apache_1.3.1. Keep the .tar file because you will need to start fresh to make the SSL version. Get into the .src directory. There are a number of files with names in capital letters, like README, that look as if you ought to read them. The KEYS file contains the PGP keys of various Apache Group members. It is more useful for checking future downloads of Apache than the current one (since a Bad Guy will obviously have replaced the KEYS file with his own). The distribution may have been signed by one or more Apache Group members.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Apache Under Windows
In our view, Win32 currently comprises Windows 95, Windows 98, and NT. As far as we know, these different versions are the same as far as Apache is concerned, except that under NT, Apache can also be run as a service. Performance under Win32 may not be as good as under Unix, but this will probably improve over coming months.
Since Win32 is considerably more consistent than the sprawling family of Unices, and since it loads extra modules as DLLs at runtime, rather than compiling them at make time, it is practical for the Apache Group to offer a precompiled binary executable as the standard distribution. Go to http://www.apache.org/dist and click on the version you want, which will be in the form of a self-installing .exe file (the .exe extension is how you tell which one is the Win32 Apache). Download it into, say, c:\temp and then run it from the Win32 Start menu's Run option.
The executable will create an Apache directory, C:\Program Files\Apache, by default. Everything to do with Win32 Apache happens in an MS-DOS window, so get into a window and type:
> cd c:\<apache directory>> dir
         
and you should see something like this:
Volume in drive C has no label
 Volume Serial Number is 294C-14EE
 Directory of C:\apache
.              <DIR>        21/05/98   7:27 .
..             <DIR>        21/05/98   7:27 ..
DEISL1   ISU        12,818  29/07/98  15:12 DeIsL1.isu
HTDOCS         <DIR>        29/07/98  15:12 htdocs
MODULES        <DIR>        29/07/98  15:12 modules
ICONS          <DIR>        29/07/98  15:12 icons
LOGS           <DIR>        29/07/98  15:12 logs
CONF           <DIR>        29/07/98  15:12 conf
CGI-BIN        <DIR>        29/07/98  15:12 cgi-bin
ABOUT_~1            12,921  15/07/98  13:31 ABOUT_APACHE
ANNOUN~1             3,090  18/07/98  23:50 Announcement
KEYS                22,763  15/07/98  13:31 KEYS
LICENSE              2,907  31/03/98  13:52 LICENSE
APACHE   EXE         3,072  19/07/98  11:47 Apache.exe
APACHE~1 DLL       247,808  19/07/98  12:11 ApacheCore.dll
MAKEFI~1 TMP        21,025  15/07/98  18:03 Makefile.tmpl
README               2,109  01/04/98  13:59 README
README~1 TXT         2,985  30/05/98  13:57 README-NT.TXT
INSTALL  DLL        54,784  19/07/98  11:44 install.dll
_DEISREG ISR           147  29/07/98  15:12 _DEISREG.ISR
_ISREG32 DLL        40,960  23/04/97   1:16 _ISREG32.DLL
        13 file(s)        427,389 bytes
         8 dir(s)     520,835,072 bytes free
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Apache Under BS2000/OSD and AS/400
As we were writing this edition, the Apache group announced ports to Siemens Nixdorf mainframes running BS2000/OSD on an IBM 390 - compatible processor and also to IBM's AS 400. We imagine that few readers of this book will be interested, but those that are should see the Apache documentation for details.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Our First Web Site
We now have a shiny bright apache/httpd, ready for anything. As we shall see, we will be creating a number of demonstration web sites.
It might be a good idea to get a firm idea of what, in the Apache business, a web site is: It is a directory somewhere on the server, say, /usr/www/site.for_instance. It contains at least three essential subdirectories:
conf
Contains the Config file, which tells Apache how to respond to different kinds of requests
htdocs
Contains the documents, images, data, and so forth that you want to serve up to your clients
logs
Contains the log files that record what happened
Most of this book is about writing the Config file, using Apache's 150 or so directives. Nothing happens until you start Apache. If the conf subdirectory is not in the default location (it usually isn't), you need a flag that tells Apache where it is.
   
httpd -d /usr/www/site.for_instance
         
   
apache -d c:/usr/www/site.for_instance
         
Notice that the executable names are different under Win32 and Unix. The Apache Group decided to make this change, despite the difficulties it causes for documentation, because "httpd" is not a particularly sensible name for a specific web server, and, indeed, is used by other web servers. However, it was felt that the name change would cause too many backward compatibility issues on Unix, and so the new name is implemented only on Win32.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Is a Web Site?
It might be a good idea to get a firm idea of what, in the Apache business, a web site is: It is a directory somewhere on the server, say, /usr/www/site.for_instance. It contains at least three essential subdirectories:
conf
Contains the Config file, which tells Apache how to respond to different kinds of requests
htdocs
Contains the documents, images, data, and so forth that you want to serve up to your clients
logs
Contains the log files that record what happened
Most of this book is about writing the Config file, using Apache's 150 or so directives. Nothing happens until you start Apache. If the conf subdirectory is not in the default location (it usually isn't), you need a flag that tells Apache where it is.
   
httpd -d /usr/www/site.for_instance
         
   
apache -d c:/usr/www/site.for_instance
         
Notice that the executable names are different under Win32 and Unix. The Apache Group decided to make this change, despite the difficulties it causes for documentation, because "httpd" is not a particularly sensible name for a specific web server, and, indeed, is used by other web servers. However, it was felt that the name change would cause too many backward compatibility issues on Unix, and so the new name is implemented only on Win32.
Also note that the Win32 version still uses forward slashes rather than backslashes. This is because Apache internally uses forward slashes on all platforms; therefore, you should never use a backslash in an Apache Config file, regardless of the operating system.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Apache's Flags
httpd (or apache) takes the following flags:
-D name
Defines a name for <IfDefine> directives.
-d directory
Specifies an alternate initial ServerRoot directory.
-f filename
Specifies an alternate ServerConfig file.
-C " directive "
Processes the given directive before reading Config file(s).
-c " directive "
Processes the given directive after reading Config file(s).
-v
Shows version number.
-V
Shows compile settings.
-h
Lists available Config directives.
-l
Lists compiled modules.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
site.toddle
You can't do much with Apache without a web site to play with. To embody our first shaky steps, we created site.toddle as a subdirectory, /usr/www/site.toddle. Since you may want to keep your demonstration sites somewhere else, we normally refer to this path as ... /. So we will talk about ... /site.toddle (Windows users, please read this as ...\site.toddle).
In ... /site.toddle, we created the three subdirectories Apache expects: conf, logs, and htdocs. The README file in Apache's root directory states:
The next step is to edit the configuration files for the server. In the subdirectory called conf you should find distribution versions of the three configuration files: srm.conf-dist, access.conf-dist, and httpd.conf-dist.
As a legacy from NCSA, Apache will accept these three Config files. But we strongly advise you to put everything you need in httpd.conf, and to delete the other two. It is much easier to manage the Config file if there is only one of them. From Apache v1.3.4-dev on, this has become Group doctrine. In earlier versions of Apache, it was necessary to disable these files explicitly once they were deleted, but in v1.3 it is enough that they do not exist.
The README file continues with advice about editing these files, which we will disregard. In fact, we don't have to set about this job yet. We will learn more later. A simple expedient for now is to run Apache with no configuration and to let it prompt us for what it needs.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Setting Up a Unix Server
We can point httpd at our site with the -d flag (notice the full pathname to the site.toddle directory):
% httpd -d /usr/www/site.toddle
         
Since you will be typing this a lot, it's sensible to copy it into a script called go in /usr/local/bin by typing:
% cat > /usr/local/bin/go
            httpd -d `pwd`
            ^d
         
^d is shorthand for CTRL-D, which ends the input and gets your prompt back. This go will work on every site.
Make go runnable and run it by typing the following (note that you have to be in the directory .../site.toddle when you run go):
% chmod +x /usr/local/bin/go
% go
         
This launches Apache in the background. Check that it's running by typing something like this (arguments to ps vary from Unix to Unix):
% ps -aux
         
This Unix utility lists all the processes running, among which you should find several httpds.
Sooner or later, you have finished testing and want to stop Apache. In order to do this, you have to get the process identity (PID) using ps -aux and execute the Unix utility kill:
% kill 
            PID
         
Alternatively, since Apache writes its PID in the file ... /logs/httpd.pid (by default—see the PidFile directive), you can write yourself a little script, as follows:
kill `cat /usr/www/site.toddle/logs/httpd.pid`
You may prefer to put more generalized versions of these scripts somewhere on your path. For example, the following scripts will start and stop a server based in your current directory. go looks like this:
httpd -d `pwd`
and stop looks like this:
pwd | read path
kill `cat $path/logs/httpd.pid`
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Setting Up a Win32 Server
There is no point trying to run Apache unless TCP/IP is set up and running on your machine. In our experience, if it isn't, Apache will crash Windows 95. A quick test is to ping some IP—and if you can't think of a real one, ping yourself:
>ping 127.0.0.1
         
If TCP/IP is working, you should see some collaborative message like:
Pinging 127.0.0.1 with 32 bytes of data: 
Reply from 127.0.0.1: bytes=32 time<10ms TTL=32
....
If you don't see something along these lines, defer further operations until TCP/IP is working.
It is important to remember that internally, Windows Apache is essentially the same as the Unix version and that it uses Unix-style forward slashes ("/") rather than MS-DOS- and Windows-style backslashes ("\") in its file and directory names as specified in various files.
There are several ways of running Apache under Win32. Under NT, you can run it as a service, operating in the background. First you have to install it as a service by running the "Install Apache as a Service" option from the Start menu. Alternatively, click on the MS-DOS prompt to get a DOS session window. Go to the /Program Files/Apache directory (or wherever else you installed Apache) with:
>cd "\Program Files\apache"
         
Apache can be installed as an NT service with:
>apache -i
         
and uninstalled with:
>apache -u
         
Once this is done, you can open the Services window in the Control Panel, select Apache, and click on Start. Apache then runs in the background until you click on Stop. Alternatively, you can open a console window and type:
>net start apache
>net stop apache
         
To run Apache from a console window, select the Apache server option from the Start menu.
Alternatively—and under Win95, this is all you can do—click on the MS-DOS prompt to get a DOS session window. Go to the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Toward a Real Web Site
We are now in a position to start creating real(ish) web sites, which can be found on the accompanying CD-ROM. For the sake of a little extra realism, we will base them loosely round a simple web business, Butterthlies, Inc., that creates and sells picture postcards. We need to give it some web addresses, but since we don't yet want to venture into the outside world, they should be variants on your own network ID so that all the machines in the network realize that they don't have to go out on the Web to make contact. For instance, we edited the \windows\hosts file on the Win95 machine running the browser and the /etc/hosts file on the Unix machine running the server to read as follows:
127.0.0.1 localhost
192.168.123.2 www.butterthlies.com
192.168.123.2 sales.butterthlies.com
192.168.123.3 sales-IP.butterthlies.com
192.168.124.1 www.faraway.com
localhost is obligatory, so we left it in, but you should not make any server requests to it since the results are likely to be confusing.
You probably need to consult your network manager to make similar arrangements.
site.simple is site.toddle with a few small changes. The script go is different in that it refers to ... /site.simple/conf/httpd.conf rather than ... /site.toddle/conf/httpd.conf.
Unix:
% httpd -d /usr/www/site.simple
         
Win32:
>apache -d c:/usr/www/site.simple
         
This will be true of each site in the demonstration setup, so we will not mention it again.
From here on there will be minimal differences between the server setups necessary for Win32 and those for Unix. Unless one or the other is specifically mentioned, you should assume that the text refers to both.
It would be nice to have a log of what goes on. In the first edition of this book we found that a file access_log was created automatically in ...site.simple/logs. In a rather bizarre move since then, the Apache Group has broken backward compatibility and now requires you to mention the log file explicitly in the Config file using the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
More and Better Web Sites: site.simple
We are now in a position to start creating real(ish) web sites, which can be found on the accompanying CD-ROM. For the sake of a little extra realism, we will base them loosely round a simple web business, Butterthlies, Inc., that creates and sells picture postcards. We need to give it some web addresses, but since we don't yet want to venture into the outside world, they should be variants on your own network ID so that all the machines in the network realize that they don't have to go out on the Web to make contact. For instance, we edited the \windows\hosts file on the Win95 machine running the browser and the /etc/hosts file on the Unix machine running the server to read as follows:
127.0.0.1 localhost
192.168.123.2 www.butterthlies.com
192.168.123.2 sales.butterthlies.com
192.168.123.3 sales-IP.butterthlies.com
192.168.124.1 www.faraway.com
localhost is obligatory, so we left it in, but you should not make any server requests to it since the results are likely to be confusing.
You probably need to consult your network manager to make similar arrangements.
site.simple is site.toddle with a few small changes. The script go is different in that it refers to ... /site.simple/conf/httpd.conf rather than ... /site.toddle/conf/httpd.conf.
Unix:
% httpd -d /usr/www/site.simple
         
Win32:
>apache -d c:/usr/www/site.simple
         
This will be true of each site in the demonstration setup, so we will not mention it again.
From here on there will be minimal differences between the server setups necessary for Win32 and those for Unix. Unless one or the other is specifically mentioned, you should assume that the text refers to both.
It would be nice to have a log of what goes on. In the first edition of this book we found that a file access_log was created automatically in ...site.simple/logs. In a rather bizarre move since then, the Apache Group has broken backward compatibility and now requires you to mention the log file explicitly in the Config file using the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Butterthlies, Inc., Gets Going
The httpd.conf file (to be found in ... /site.first) contains the following:
User webuser
Group webgroup
ServerName localhost
DocumentRoot /usr/www/site.first/htdocs
TransferLog logs/access_log
In the first edition of this book we mentioned the directives AccessConfig and ResourceConfig here. If set with /dev/null (NUL under Win32), they disable the srm.conf and access.conf files, and were formerly required if those files were absent. However, new versions of Apache ignore these files if they are not present, so the directives are no longer required.
   If you are using Win32, note that the User and Group directives are not supported, so these can be removed.
Apache's role in life is delivering documents, and so far we have not done much of that. We therefore begin in a modest way with a little HTML script that lists our cards, gives their prices, and tells interested parties how to get them.
We can look at the Netscape Help item "Creating Net Sites" and download "A Beginners Guide to HTML" as well as the next web person, then rough out a little brochure in no time flat:
<html>
<h1> Welcome to Butterthlies Inc</h1>
<h2>Summer Catalog</h2>
<p> All our cards are available in packs of 20 at $2 a pack.
There is a 10% discount if you order more than 100.
</p>
<hr>
<p>
Style 2315
<p align=center>
<img src="bench.jpg" alt="Picture of a bench">
<p align=center>
Be BOLD on the bench
<hr>
<p>
Style 2316
<p align=center>
<img src="hen.jpg" ALT="Picture of a hencoop like a pagoda">
<p align=center>
Get SCRAMBLED in the henhouse
<HR>
<p>
Style 2317
<p align=center>
<img src="tree.jpg" alt="Very nice picture of tree">
<p align=center>
Get HIGH in the treehouse
<hr>
<p>
Style 2318
<p align=center>
<img src="bath.jpg" alt="Rather puzzling picture of a bathtub">
<p align=center>
Get DIRTY in the bath
<hr>
<p align=right>
Postcards designed by Harriet@alart.demon.co.uk
<hr>
<br>
Butterthlies Inc, Hopeful City, Nevada 99999
</br>
</HTML>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Block Directives
Apache has a number of block directives that limit the application of other directives within them to operations on particular virtual hosts, directories, or files. These are extremely important to the operation of a real web site because within these blocks—particularly <VirtualHost>—the webmaster can, in effect, set up a large number of individual servers run by a single invocation of Apache. This will make more sense when you get to Section 3.5, further on in this chapter.
The syntax of the block directives is detailed next.
<VirtualHost host[:port]>
...
</VirtualHost>
Server config
The <VirtualHost> directive within a Config file acts like a tag in HTML: it introduces a block of text containing directives referring to one host; when we're finished with it, we stop with </VirtualHost>. For example:
....
<VirtualHost www.butterthlies.com>
ServerAdmin sales@butterthlies.com
DocumentRoot /usr/www/site.virtual/htdocs/customers
ServerName www.butterthlies.com
ErrorLog /usr/www/site.virtual/name-based/logs/error_log
TransferLog /usr/www/site.virtual/name-based/logs/access_log
</VirtualHost>
...
<VirtualHost> also specifies which IP address we're hosting and, optionally, the port. If port is not specified, the default port is used, which is either the standard HTTP port, 80, or the port specified in a Port directive. host can also be _default_ , in which case it matches anything no other <VirtualHost> section matches.
In a real system, this address would be the hostname of our server. The <VirtualHost> directive has three analogues that also limit the application of other directives:
  • <Directory>
  • <Files>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Other Directives
Other housekeeping directives are listed here.
ServerName hostname 
Server config, virtual host
ServerName gives the hostname of the server to use when creating redirection URLs, that is, if you use a <Location> directive or access a directory without a trailing "/".
               UseCanonicalName on|off
Default: on 
Server config, virtual host, directory, .htaccess
This directive controls how Apache forms URLs that refer to itself, for example, when redirecting a request for http://www.domain.com/some/directory to the correct http://www.domain.com/some/directory/ (note the trailing "/" ). If UseCanonical-Name is on (the default), then the hostname and port used in the redirect will be those set by ServerName and Port. If it is off, then the name and port used will be the ones in the original request.
One instance where this directive may be useful is when users are in the same domain as the web server (for example, on an intranet). In this case, they may use the "short" name for the server (www, for example), instead of the fully qualified domain name (www.domain.com, say). If a user types a URL such as http://www/somedir (without the trailing slash), then, with UseCanonicalName switched on , the user will be directed to http://www.domain.com/somedir/, whereas with UseCanonicalName switched off, he or she will be redirected to http://www/somedir/. An obvious case in which this is useful is when user authentication is switched on: reusing the server name that the user typed means they won't be asked to reauthenticate when the server name appears to the browser to have changed. More obscure cases relate to name/address translation caused by some firewalling techniques.
ServerAdmin email_address
Server config, virtual host
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Two Sites and Apache
Our business has now expanded, and we have a team of salespeople. They need their own web site with different prices, gossip about competitors, conspiracies, plots, plans, and so on, that is separate from the customers' web site we have been talking about. There are essentially two ways of doing this:
  1. Run a single copy of Apache that maintains two or more web sites as virtual sites. This is the most usual method.
  2. Run two (or more) copies of Apache, each maintaining a single site. This is seldom done, but we include it for the sake of completeness.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Controlling Virtual Hosts on Unix
When started without the - X flag, which is what you would do in real operation, Apache launches a number of child versions of itself so that any incoming request can be instantly dealt with. This is an excellent scheme, but we need some way of controlling this sprawl of software. The necessary directives are there to do it.
               
               
               
               MaxClients number
Default number: 150
Server config
This directive limits the number of requests that will be dealt with simultaneously. In the current version of Apache, this effectively limits the number of servers that can run at one time.
               
               
               
               MaxRequestsPerChild number
Default number: 30
Server config
Each child version of Apache handles this number of requests and dies (unless the value is 0, in which case it will last forever or until the machine is rebooted). It is a good idea to set a number here so that any accidental memory leaks in Apache are tidied up. Although there are no known leaks in Apache, it is not impossible for them to occur in the system libraries, so it is probably wise not to disable this unless you are absolutely sure the code is byte-tight.
               MaxSpareServers number
Default number: 10
Server config
No more than this number of child servers will be left running and unused. Setting this to an unnecessarily large number is a bad idea, since it depletes resources needlessly. How many is too many depends on which modules you have used and your detailed configuration. You can get some clues by studying memory consumption with ps, top, and the like.
               MinSpareServers number
Default number
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Controlling Virtual Hosts on Win32
The Win32 version of Apache runs a parent version of the code and a single multi-threaded child that handles all requests.
               ThreadsPerChild number
Default number: 50
Server config
Currently this directive is only relevant to Win32. You may need to increase this number from 50, the default, if your site gets a lot of simultaneous hits. The name ThreadsPerChild may suggest that there can be more than one child process in a Win32 installation, but this is not currently the case.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Virtual Hosts
On site.twocopy (see Section 3.9, later in this chapter) we run two different versions of Apache, each serving a different URL. It would be rather unusual to do this in real life. It is more common to run a number of virtual Apaches that steer incoming requests on different URLs—usually with the same IP address—to different sets of documents. These might well be home pages for members of your organization or your clients.
In the first edition of this book we showed how to do this for Apache 1.2 and HTTP/1.0. The result was rather clumsy, with a main host and a virtual host, but it coped with HTTP/1.0 clients. However, the setup can now be done much more neatly with the NameVirtualHost directive. The possible combinations of IP-based and name-based hosts can become quite complex. A full explanation with examples and the underlying theology can be found at http://www.apache.org/docs/vhosts but it has to be said that several of the possible permutations are unlikely to be very useful in practice.
This is by far the preferred method of managing virtual hosts, taking advantage of the ability of HTTP/1.1-compliant browsers to send the name of the site they want to access. At .../site.virtual/Name-based we have www.butterthlies.com and sales. butterthlies.com on 192.168.123.2. Of course, these sites must be registered on the Web (or if you are dummying the setup as we did, included in /etc/hosts). The Config file is as follows:
User webuser
Group webgroup

NameVirtualHost 192.168.123.2

<VirtualHost www.butterthlies.com>
ServerAdmin sales@butterthlies.com
DocumentRoot /usr/www/site.virtual/htdocs/customers
ServerName www.butterthlies.com
ErrorLog /usr/www/site.virtual/name-based/logs/error_log
TransferLog /usr/www/site.virtual/name-based/logs/access_log
</VirtualHost>

<VirtualHost sales.butterthlies.com>
ServerAdmin sales@butterthlies.com
DocumentRoot /usr/www/site.virtual/htdocs/salesmen
ServerName sales.butterthlies.com
ErrorLog /usr/www/site.virtual/name-based/logs/error_log
TransferLog /usr/www/site.virtual/name-based/logs/access_log
</VirtualHost>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Two Copies of Apache
To illustrate the possibilities, we will run two copies of Apache with different IP addresses on different consoles, as if they were on two completely separate machines. This is not something you want to do often, but for the sake of completeness, here it is. Normally, you would only bother if the different virtual hosts needed very different configurations, such as different values for ServerType, User, TypesConfig, or ServerRoot (none of these directives can apply to a virtual host, since they are global to all servers, which is why you have to run two copies to get the desired effect). If you are expecting a lot of hits, you should try to avoid running more than one copy, as doing so will generally load the machine more.