BUY THIS BOOK
Add to Cart

Print Book $39.95


Add to Cart

Print+PDF $51.94

Add to Cart

PDF $31.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £28.50

What is this?

Looking to Reprint or License this content?


Apache: The Definitive Guide
Apache: The Definitive Guide, Third Edition

By Ben Laurie, Peter Laurie
Book Price: $39.95 USD
£28.50 GBP
PDF Price: $31.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Getting Started
Apache is the dominant web server on the Internet today, filling a key place in the infrastructure of the Internet. This chapter will explore what web servers do and why you might choose the Apache web server, examine how your web server fits into the rest of your network infrastructure, and conclude by showing you how to install Apache on a variety of different systems.
The whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming.
When you fire up your browser and connect to the URL of someone's home page — say the notional http://www.butterthlies.com/ we shall meet later on — you send a message across the Internet to the machine at that address. That machine, you hope, is up and running; its Internet connection is working; and it is ready to receive and act on your message.
URL stands for Uniform Resource Locator. A URL such as http://www.butterthlies.com/ comes in three parts:
<scheme>://<host>/<path>
So, in our example, < scheme> is http, meaning that the browser should use HTTP (Hypertext Transfer Protocol); <host> is www.butterthlies.com ; and <path> is /, traditionally meaning the top page of the host. The <host> may contain either an IP address or a name, which the browser will then convert to an IP address. Using HTTP 1.1, your browser might send the following request to the computer at that IP address:
GET / HTTP/1.1
Host: www.butterthlies.com
The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in four parts: a method (an HTTP method, not a URL method), that in this case is GET, but could equally be PUT, POST, DELETE, or CONNECT; the Uniform Resource Identifier (URI) /; the version of the protocol we are using; and a series of headers that modify the request (in this case, a Host header, which is used for name-based virtual hosting: see Chapter 4). It is then up to the web server running on that host to make something of this message.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Does a Web Server Do?
The whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming.
When you fire up your browser and connect to the URL of someone's home page — say the notional http://www.butterthlies.com/ we shall meet later on — you send a message across the Internet to the machine at that address. That machine, you hope, is up and running; its Internet connection is working; and it is ready to receive and act on your message.
URL stands for Uniform Resource Locator. A URL such as http://www.butterthlies.com/ comes in three parts:
<scheme>://<host>/<path>
So, in our example, < scheme> is http, meaning that the browser should use HTTP (Hypertext Transfer Protocol); <host> is www.butterthlies.com ; and <path> is /, traditionally meaning the top page of the host. The <host> may contain either an IP address or a name, which the browser will then convert to an IP address. Using HTTP 1.1, your browser might send the following request to the computer at that IP address:
GET / HTTP/1.1
Host: www.butterthlies.com
The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in four parts: a method (an HTTP method, not a URL method), that in this case is GET, but could equally be PUT, POST, DELETE, or CONNECT; the Uniform Resource Identifier (URI) /; the version of the protocol we are using; and a series of headers that modify the request (in this case, a Host header, which is used for name-based virtual hosting: see Chapter 4). It is then up to the web server running on that host to make something of this message.
The host machine may be a whole cluster of hypercomputers costing an oil sheik's ransom or just a humble PC. In either case, it had better be running a web server, a program that listens to the network and accepts and acts on this sort of message.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Apache Works
Apache is a program that runs under a suitable multitasking operating system. In the examples in this book, the operating systems are Unix and Windows 95/98/2000/Me/NT/..., which we call Win32. There are many others: flavors of Unix, IBM's OS/2, and Novell Netware. Mac OS X has a FreeBSD foundation and ships with Apache.
The Apache binary is called httpd under Unix and apache.exe under Win32 and normally runs in the background. Each copy of httpd/apache that is started has its attention directed at a web site , which is, for our purposes, a directory. Regardless of operating system, a site directory typically contains four subdirectories:
conf
Contains the configuration file(s), of which httpd.conf is the most important. It is referred to throughout this book as the Config file. It specifies the URLs that will be served.
htdocs
Contains the HTML files to be served up to the site's clients. This directory and those below it, the web space, are accessible to anyone on the Web and therefore pose a severe security risk if used for anything other than public data.
logs
Contains the log data, both of accesses and errors.
cgi-bin
Contains the CGI scripts. These are programs or shell scripts written by or for the webmaster that can be executed by Apache on behalf of its clients. It is most important, for security reasons, that this directory not be in the web space — that is, in .../htdocs or below.
In its idling state, Apache does nothing but listen to the IP addresses specified in its Config file. When a request appears, Apache receives it and analyzes the headers. It then applies the rules it finds in the Config file and takes the appropriate action.
The webmaster's main control over Apache is through the Config file. The webmaster has some 200 directives at her disposal, and most of this book is an account of what these directives do and how to use them to reasonable advantage. The webmaster also has a dozen flags she can use when Apache starts up.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Apache and Networking
At its core, Apache is about communication over networks. Apache uses the TCP/IP protocol as its foundation, providing an implementation of HTTP. Developers who want to use Apache should have at least a foundation understanding of TCP/IP and may need more advanced skills if they need to integrate Apache servers with other network infrastructure like firewalls and proxy servers.
To understand the substance of this book, you need a modest knowledge of what TCP/IP is and what it does. You'll find more than enough information in Craig Hunt and Robert Bruce Thompson's books on TCP/IP, but what follows is, we think, what is necessary to know for our book's purposes.
TCP/IP (Transmission Control Protocol/Internet Protocol) is a set of protocols enabling computers to talk to each other over networks. The two protocols that give the suite its name are among the most important, but there are many others, and we shall meet some of them later. These protocols are embodied in programs on your computer written by someone or other; it doesn't much matter who. TCP/IP seems unusual among computer standards in that the programs that implement it actually work, and their authors have not tried too much to improve on the original conceptions.
TCP/IP is generally only used where there is a network. Each computer on a network that wants to use TCP/IP has an IP address , for example, 192.168.123.1.
There are four parts in the address, separated by periods. Each part corresponds to a byte, so the whole address is four bytes long. You will, in consequence, seldom see any of the parts outside the range 0 -255.
Although not required by the protocol, by convention there is a dividing line somewhere inside this number: to the left is the network number and to the right, the host number. Two machines on the same physical network — usually a local area network (LAN) — normally have the same network number and communicate directly using TCP/IP.
How do we know where the dividing line is between network number and host number? The default dividing line used to be determined by the first of the four numbers, but a shortage of addresses required a change to the use of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How HTTP Clients Work
Once the server is set up, we can get down to business. The client has the easy end: it wants web action on a particular site, and it sends a request with a URL that begins with http to indicate what service it wants (other common services are ftp for File Transfer Protocolor https for HTTP with Secure Sockets Layer — SSL) and continues with these possible parts:
 //<user>:<password>@<host>:<port>/<url-path>
RFC 1738 says:
Some or all of the parts "<user>:<password>@", ":<password>",":<port>", and "/<url-path>" may be omitted. The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax.
In real life, URLs look more like: http://www.apache.org/ — that is, there is no user and password pair, and there is no port. What happens?
The browser observes that the URL starts with http: and deduces that it should be using the HTTP protocol. The client then contacts a name server, which uses DNS to resolve www.apache.org to an IP address. At the time of writing, this was 63.251.56.142. One way to check the validity of a hostname is to go to the operating-system prompt and type:
             ping www.apache.org
         
If that host is connected to the Internet, a response is returned:
Pinging www.apache.org [63.251.56.142] with 32 bytes of data:

Reply from 63.251.56.142: bytes=32 time=278ms TTL=49
Reply from 63.251.56.142: bytes=32 time=620ms TTL=49
Reply from 63.251.56.142: bytes=32 time=285ms TTL=49
Reply from 63.251.56.142: bytes=32 time=290ms TTL=49

Ping statistics for 63.251.56.142:
A URL can be given more precision by attaching a port number: the web address http://www.apache.org doesn't include a port because it is port 80, the default, and the browser takes it for granted. If some other port is wanted, it is included in the URL after a colon — for example, http://www.apache.org:8000/. We will have more to do with ports later.
The URL always includes a path, even if is only /. If the path is left out by the careless user, most browsers put it back in. If the path were
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Happens at the Server End?
We assume that the server is well set up and running Apache. What does Apache do? In the simplest terms, it gets a URL from the Internet, turns it into a filename, and sends the file (or its output if it is a program) back down the Internet. That's all it does, and that's all this book is about!
Two main cases arise:
  • The Unix server has a standalone Apache that listens to one or more ports (port 80 by default) on one or more IP addresses mapped onto the interfaces of its machine. In this mode (known as standalone mode ), Apache actually runs several copies of itself to handle multiple connections simultaneously.
  • On Windows, there is a single process with multiple threads. Each thread services a single connection. This currently limits Apache 1.3 to 64 simultaneous connections, because there's a system limit of 64 objects for which you can wait at once. This is something of a disadvantage because a busy site can have several hundred simultaneous connections. It has been improved in Apache 2.0. The default maximim is now 1920 — but even that can be extended at compile time.
Both cases boil down to an Apache server with an incoming connection. Remember our first statement in this section, namely, that the object of the whole exercise is to resolve the incoming request either into a filename or the name of a script, which generates data internally on the fly. Apache thus first determines which IP address and port number were used by asking the operating system to where the connection is connecting. Apache then uses the IP address, port number — and the Host header in HTTP 1.1 — to decide which virtual host is the target of this request. The virtual host then looks at the path, which was handed to it in the request, and reads that against its configuration to decide on the appropriate response, which it then returns.
Most of this book is about the possible appropriate responses and how Apache decides which one to use.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Planning the Apache Installation
Unless you're using a prepackaged installation, you'll want to do some planning before setting up the software. You'll need to consider network integration, operating system choices, Apache version choices, and the many modules available for Apache. Even if you're just using Apache at an ISP, you may want to know which choices the ISP made in its installation.
Apache installations come in many flavors. If an installation is intended only for local use on a developer's machine, it probably needs much less integration with network systems than an installation meant as public host supporting thousands of simultaneous hits. Apache itself provides network and security functionality, but you'll need to set up supporting services separately, like the DNS that identifies your server to the network or the routing that connects it to the rest of the network. Some servers operate behind firewalls, and firewall configuration may also be an issue. If these are concerns for you, involve your network administrator early in the process.
Many webmasters have no choice of operating system — they have to use what's in the box on their desks — but if they have a choice, the first decision to make is between Unix and Windows. As the reader who persists with us will discover, much of the Apache Group and your authors prefer Unix. It is, itself, essentially open source. Over the last 30 years it has been the subject of intense scrutiny and improvement by many thousands of people. On the other hand, Windows is widely available, and Apache support for Windows has improved substantially in Apache 2.0.
The choice is commonly between some sort of Linux and FreeBSD. Both are technically acceptable. If you already know someone who has one of these OSs and is willing to help you get used to yours, then it would make sense to follow them. If you are an Apple user, OS X has a Unix core and includes Apache.
Failing that, the difference between the two paths is mainly a legal one, turning on their different interperations of open source licensing.
Linux lives at http://www.linux.org, and there are more than 160 different distributions from which Linux can be obtained free or in prepackaged pay-for formats. It is rather ominously described as a "Unix-type" operating system, which sometimes means that long-established Unix standards have been "improved", not always in an upwards direction.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Windows?
The main problem with the Win32 version of Apache lies in its security, which must depend, in turn, on the security of the underlying operating system. Unfortunately, Windows 95, Windows 98, and their successors have no effective security worth mentioning. Windows NT and Windows 2000 have a large number of security features, but they are poorly documented, hard to understand, and have not been subjected to the decades of public inspection, discussion, testing, and hacking that have forged Unix security into a fortress that can pretty well be relied upon.
It is a grave drawback to Windows that the source code is kept hidden in Microsoft's hands so that it does not benefit from the scrutiny of the computing community. It is precisely because the source code of free software is exposed to millions of critical eyes that it works as well as it does.
In the view of the Apache development group, the Win32 version is useful for easy testing of a proposed web site. But if money is involved, you would be wise to transfer the site to Unix before exposure to the public and the Bad Guys.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Which Apache?
At the time this edition was prepared, Apache 1.3.26 was the stable release. It has an improved build system (see the section that follows). Both the Unix and Windows versions were thought to be in good shape. Apache 2.0 had made it through beta test into full release. We suggest that if you are working under Unix and you don't need Apache 2.0's improved features (which are multitudinous but not fundamental for the ordinary webmaster), you go for Version 1.3.26 or later.
Apache 2.0 is a major new version. The main new features are multithreading (on platforms that support it), layered I/O (also known as filters), and a rationalized API. The ordinary user will see very little difference, but the programmer writing new modules (see the section that follows) will find a substantial change, which is reflected in our rewritten Chapter 20 and Chapter 21. However, the improvements in Apache v2.0 look to the future rather than trying to improve the present. The authors are not planning to transfer their own web sites to v2.0 any time soon and do not expect many other sites to do so either. In fact, many sites are still happily running Apache v1.2, which was nominally superseded several years ago. There are good security reasons for them to upgrade to v1.3.
Apache 2.0 is designed to run on Windows NT and 2000. The binary installer will only work with x86 processors. In all cases, TCP/IP networking must be installed. If you are using NT 4.0, install Service Pack 3 or 6, since Pack 4 had TCP/IP problems. It is not recommended that Windows 95 or 98 ever be used for production servers and, when we went to press, Apache 2.0 would not run under either at all. See http://httpd.apache.org/docs/windows.html.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Installing Apache
There are two ways of getting Apache running on your machine: by downloading an appropriate executable or by getting the source code and compiling it. Which is better depends on your operating system.
The fairly painless business of compiling Apache, which is described later, can now be circumvented by downloading a precompiled binary for the Unix of your choice. When we went to press, the following operating systems (mostly versions of Unix) were suported, but check before you decide. (See http://httpd.apache.org/dist/httpd/binaries.)
aix
aux
beos
bs2000-osd
bsdi
darwin
dgux
digitalunix
freebsd
hpux
irix
linux
macosx
macosxserver
netbsd
netware
openbsd
os2
os390
osf1
qnx
reliantunix
rhapsody
sinix
solaris
sunos
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Building Apache 1.3.X Under Unix
There are two methods for building Apache: the "Semimanual Method" and "Out of the Box". They each involve the user in about the same amount of keyboard work: if you are happy with the defaults, you need do very little; if you want to do a custom build, you have to do more typing to specify what you want.
Both methods rely on a shell script that, when run, creates a Makefile. When you run make, this, in turn, builds the Apache executable with the side orders you asked for. Then you copy the executable to its home (Semimanual Method) or run make install (Out of the Box) and the various necessary files are moved to the appropriate places around the machine.
Between the two methods, there is not a tremendous amount to choose. We prefer the Semimanual Method because it is older and more reliable. It is also nearer to the reality of what is happening and generates its own record of what you did last time so you can do it again without having to perform feats of memory. Out of the Box is easier if you want a default build. If you want a custom build and you want to be able to repeat it later, you would do the build from a script that can get quite large. On the other hand, you can create several different scripts to trigger different builds if you need to.
Until Apache 1.3, there was no real out-of-the-box batch-capable build and installation procedure for the complete Apache package. This method is provided by a top-level configure script and a corresponding top-level Makefile.tmpl file. The goal is to provide a GNU Autoconf-style frontend that is capable of driving the old src/Configure stuff in batch.
Once you have extracted the sources (see earlier), the build process can be done in a minimum of three command lines — which is how most Unix software is built nowadays. Change yourself to root before you run ./configure; otherwise, if you use the default build configuration (which we suggest you do not), the server will be looking at port 8080 and will, confusingly, refuse requests to the default port, 80.
The result is, as you will be told during the process, probably not what you really want:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
New Features in Apache v2
The procedure for configuring and compiling Apache has changed, as we will see later.
High-level decisions about the way Apache works internally can now be made at compile time by including one of a series of Multi Processing Modules (MPMs). This is done by attaching a flag to configure:
./configure <other flags> --with_mpm=<name of MPM>
Although MPMs are rather like ordinary modules, only one can be used at a time. Some of them are designed to adapt Apache to different operating systems; others offer a range of different optimizations for Unix.
It will be shown, along with the other compiled-in modules, by executing httpd -l. When we went to press, these were the possible MPMs under Unix:
prefork
Default. Most closely imitates behavior of v1.3. Currently the default for Unix and sites that require stability, though we hope that threading will become the default later on.
threaded
Suitable for sites that require the benefits brought by threading, particularly reduced memory footprint and improved interthread communications. But see "prefork" earlier in this list.
perchild
Allows different hosts to have different user IDs.
mpmt_pthread
Similar to prefork, but each child process has a specified number of threads. It is possible to specify a minimum and maximum number of idle threads.
Dexter
Multiprocess, multithreaded MPM that allows you to specify a static number of processes.
Perchild
Similar to Dexter, but you can define a seperate user and group for each child process to increase server security.
Other operating systems have their own MPMs:
spmt_os2
For OS2.
beos
For the Be OS.
WinNT
Win32-specific version, taking advantage of completion ports and native function calls to give better network performance.
To begin with, accept the default MPM. More advanced users should refer to http://httpd.apache.org/docs-2.0/mpm.html and http://httpd.apache.org/docs-2.0/misc/perf-tuning.html.
See the entry for the AcceptMutex directive in Chapter 3.
Version 2.0 makes the following changes to the Config file:
  • CacheNegotiatedDocs now takes the argument on/off. Existing instances of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Making and Installing Apache v2 Under Unix
Disregard all the previous instructions for Apache compilation. There is no longer a .../src directory. Even the name of the Unix source file has changed. We downloaded httpd-2_0_40.tar.gz and unpacked it in /usr/src/apache as usual. You should read the file INSTALL. The scheme for building Apache v2 is now much more in line with that for most other downloaded packages and utilities.
Set up the configuration file with this:
./configure  --prefix=/usr/local 
or wherever it is you want to keep the Apache bits — which will appear in various subdirectories. The executable, for instance, will be in .../sbin. If you are compiling under FreeBSD, as we were, --with-mpm=prefork is automatically used internally, since threads do not currently work well under this operating system. To see all the configuration possibilities:
./configure --help | more
If you want to preserve your Apache 1.3.X executable, you might rename it to httpd.13, wherever it is, and then:
make
which takes a surprising amount of time to run. Then:
make install
         
The result is a nice new httpd in /usr/local/sbin.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Apache Under Windows
Apache 1.3 will work under Windows NT 4.0 and 2000. Its performance under Windows 95 and 98 is not guaranteed. If running on Windows 95, the "Winsock2" upgrade must be installed before Apache will run. "Winsock2" for Windows 95 is available at http://www.microsoft.com/windows95/downloads/contents/WUAdminTools/S_WUNetworkingTools/W95Sockets2. Be warned that the Dialup Networking 1.2 (MS DUN) updates include a Winsock2 that is entirely insufficient, and the Winsock2 update must be reinstalled after installing Windows 95 dialup networking. Windows 98, NT (Service Pack 3 or later), and 2000 users need to take no special action; those versions provide Winsock2 as distributed.
Apache v2 will run under Windows 2000 and NT, but, when we went to press, they did not work under Win 95, 98, or Me. These different versions are the same as far as Apache is concerned, except that under NT, Apache can also be run as a service. From Apache v1.3.14, emulators are available to provide NT services under the other Windows platforms. Performance under Win32 may not be as good as under Unix, but this will probably improve over coming months.
Since Win32 is considerably more consistent than the sprawling family of Unices, and since it loads extra modules as DLLs at runtime rather than compiling them at make time, it is practical for the Apache Group to offer a precompiled binary executable as the standard distribution. Go to http://www.apache.org/dist, and click on the version you want, which will be in the form of a self-installing .exe file (the .exe extension is how you tell which one is the Win32 Apache). Download it into, say, c:\temp, and then run it from the Win32 Start menu's Run option.
The executable will create an Apache directory, C:\Program Files\Apache, by default. Everything to do with Win32 Apache happens in an MS-DOS window, so get into a window and type:
> cd c:\<apache directory>
> dir
         
and you should see something like this:
Volume in drive C has no label
 Volume Serial Number is 294C-14EE
 Directory of C:\apache
.              <DIR>        21/05/98   7:27 .
..             <DIR>        21/05/98   7:27 ..
DEISL1   ISU        12,818  29/07/98  15:12 DeIsL1.isu
HTDOCS         <DIR>        29/07/98  15:12 htdocs
MODULES        <DIR>        29/07/98  15:12 modules
ICONS          <DIR>        29/07/98  15:12 icons
LOGS           <DIR>        29/07/98  15:12 logs
CONF           <DIR>        29/07/98  15:12 conf
CGI-BIN        <DIR>        29/07/98  15:12 cgi-bin
ABOUT_~1            12,921  15/07/98  13:31 ABOUT_APACHE
ANNOUN~1             3,090  18/07/98  23:50 Announcement
KEYS                22,763  15/07/98  13:31 KEYS
LICENSE              2,907  31/03/98  13:52 LICENSE
APACHE   EXE         3,072  19/07/98  11:47 Apache.exe
APACHE~1 DLL       247,808  19/07/98  12:11 ApacheCore.dll
MAKEFI~1 TMP        21,025  15/07/98  18:03 Makefile.tmpl
README               2,109  01/04/98  13:59 README
README~1 TXT         2,985  30/05/98  13:57 README-NT.TXT
INSTALL  DLL        54,784  19/07/98  11:44 install.dll
_DEISREG ISR           147  29/07/98  15:12 _DEISREG.ISR
_ISREG32 DLL        40,960  23/04/97   1:16 _ISREG32.DLL
        13 file(s)        427,389 bytes
         8 dir(s)     520,835,072 bytes free
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Configuring Apache: The First Steps
After the installation described in Chapter 1, you now have a shiny bright apache/httpd, and you're ready for anything. For our next step, we will be creating a number of demonstration web sites.
It might be a good idea to get a firm idea of what, in the Apache business, a web site is: it is a directory somewhere on the server, say, /usr/www/APACHE3/site.for_instance. It usually contains at least four subdirectories. The first three are essential:
conf
Contains the Config file, usually httpd.conf, which tells Apache how to respond to different kinds of requests.
htdocs
Contains the documents, images, data, and so forth that you want to serve up to your clients.
logs
Contains the log files that record what happened. You should consult .../logs/error_log whenever anything fails to work as expected.
cgi-bin
Contains any CGI scripts that are needed. If you don't use scripts, you don't need the directory.
In our standard installation, there will also be a file go in the site directory, which contains a script for starting Apache.
Nothing happens until you start Apache. In this example, you do it from the command line. If your computer experience so far has been entirely with Windows or other Graphical User Interfaces (GUIs), you may find the command line rather stark and intimidating to begin with. However, it offers a great deal of flexibility and something which is often impossible through a GUI: the ability to write scripts (Unix) or batch files (Win32) to automate the executables you want to run and the inputs they need, as we shall see later.
If the conf subdirectory is not in the default location (and it usually isn't), you need a flag that tells Apache where it is.
httpd -d /usr/www/APACHE3/site.for_instance -f...
               
apache -d c:/usr/www/APACHE3/site.for_instance
               
Notice that the executable names are different under Win32 and Unix. The Apache Group decided to make this change, despite the difficulties it causes for documentation, because "httpd" is not a particularly sensible name for a specific web server and, indeed, is used by other web servers. However, it was felt that the name change would cause too many backward-compatibility issues on Unix, and so the new name is implemented only on Win32.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What's Behind an Apache Web Site?
It might be a good idea to get a firm idea of what, in the Apache business, a web site is: it is a directory somewhere on the server, say, /usr/www/APACHE3/site.for_instance. It usually contains at least four subdirectories. The first three are essential:
conf
Contains the Config file, usually httpd.conf, which tells Apache how to respond to different kinds of requests.
htdocs
Contains the documents, images, data, and so forth that you want to serve up to your clients.
logs
Contains the log files that record what happened. You should consult .../logs/error_log whenever anything fails to work as expected.
cgi-bin
Contains any CGI scripts that are needed. If you don't use scripts, you don't need the directory.
In our standard installation, there will also be a file go in the site directory, which contains a script for starting Apache.
Nothing happens until you start Apache. In this example, you do it from the command line. If your computer experience so far has been entirely with Windows or other Graphical User Interfaces (GUIs), you may find the command line rather stark and intimidating to begin with. However, it offers a great deal of flexibility and something which is often impossible through a GUI: the ability to write scripts (Unix) or batch files (Win32) to automate the executables you want to run and the inputs they need, as we shall see later.
If the conf subdirectory is not in the default location (and it usually isn't), you need a flag that tells Apache where it is.
httpd -d /usr/www/APACHE3/site.for_instance -f...
               
apache -d c:/usr/www/APACHE3/site.for_instance
               
Notice that the executable names are different under Win32 and Unix. The Apache Group decided to make this change, despite the difficulties it causes for documentation, because "httpd" is not a particularly sensible name for a specific web server and, indeed, is used by other web servers. However, it was felt that the name change would cause too many backward-compatibility issues on Unix, and so the new name is implemented only on Win32.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
site.toddle
You can't do much with Apache without a web site to play with. To embody our first shaky steps, we created site.toddle as a subdirectory, /usr/www/APACHE3/site.toddle, which you will find on the code download. Since you may want to keep your demonstration sites somewhere else, we normally refer to this path as ... /. So we will talk about ... /site.toddle. (Windows users, please read this as ...\site.toddle).
In ... /site.toddle, we created the three subdirectories that Apache expects: conf, logs, and htdocs. The README file in Apache's root directory states:
The next step is to edit the configuration files for the server. In the subdirectory called conf you should find distribution versions of the three configuration files: srm.conf-dist, access.conf-dist, and httpd.conf-dist.
As a legacy from the NCSA server, Apache will accept these three Config files. But we strongly advise you to put everything you need in httpd.conf and to delete the other two. It is much easier to manage the Config file if there is only one of them. From Apache v1.3.4-dev on, this has become Group doctrine. In earlier versions of Apache, it was necessary to disable these files explicitly once they were deleted, but in v1.3 it is enough that they do not exist.
The README file continues with advice about editing these files, which we will disregard. In fact, we don't have to set about this job yet; we will learn more later. A simple expedient for now is to run Apache with no configuration and to let it prompt us for what it needs.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Setting Up a Unix Server
We can point httpd at our site with the -d flag (notice the full pathname to the site.toddle directory, which will probably be different on your machine):
% httpd -d /usr/www/APACHE3/site.toddle 
         
Since you will be typing this a lot, it's sensible to copy it into a script called go . This can go in /usr/local/bin or in each local site. We have done the latter since it is convenient to change it slightly from time to time. Create it by typing:
% cat > /usr/local/bin/go
test -d logs || mkdir logs
httpd -f 'pwd'/conf/httpd$1.conf -d 'pwd'
            ^d
         
^d is shorthand for Ctrl-D, which ends the input and gets your prompt back. This go will work on every site. It creates a logs directory if one does not exist, and it explicitly specifies paths for the ServerRoot directory (-d) and the Config file (-f). The command ' pwd ' finds the current directory with the Unix command pwd . The back-ticks are essential: they substitute pwd's value into the script — in other words, we will run Apache with whatever configuration is in our current directory. To accomodate sites where we have more than one Config file, we have used ...httpd$1... where you might expect to see ...httpd... The symbol $1 copies the first argument (if any) given to the command go . Thus ./go 2 will run the Config file called httpd2.conf, and ./go by itself will run httpd.conf.
Remember that you have to be in the site directory. If you try to run this script from somewhere else, pwd's return will be nonsense, and Apache will complain that it 'could not open document config file ...'.
Make go runnable, and run it by typing the following (note that you have to be in the directory .../site.toddle when you run go):
% chmod +x go
% go
         
If you get the error message:
go: command not found
you need to type:
% ./go
         
This launches Apache in the background. Check that it's running by typing something like this (arguments to psvary from Unix to Unix):
% ps -aux
         
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Setting Up a Win32 Server
There is no point trying to run Apache unless TCP/IP is set up and running on your machine. A quick test is to ping some IP — and if you can't think of a real one, ping yourself:
>ping 127.0.0.1
         
If TCP/IP is working, you should see some confirming message, like this:
Pinging 127.0.0.1 with 32 bytes of data: 
Reply from 127.0.0.1: bytes=32 time<10ms TTL=32
....
If you don't see something along these lines, defer further operations until TCP/IP is working.
It is important to remember that internally, Windows Apache is essentially the same as the Unix version and that it uses Unix-style forward slashes (/) rather than MS-DOS- and Windows-style backslashes (\) in its file and directory names, as specified in various files.
There are two ways of running Apache under Win32. In addition to the command-line approach, you can run Apache as a "service" (available on Windows NT/2000, or a pseudoservice on Windows 95, 98, or Me). This is the best option if you want Apache to start automatically when your machine boots and to keep Apache running when you log off.
To run Apache from a console window, select the Apache server option from the Start menu.
Alternatively — and under Win95/98, this is all you can do — click on the MS-DOS prompt to get a DOS session window. Go to the /Program Files/Apache directory with this:
>cd "\Program Files\apache"
            
The Apache executable, apache.exe,is sitting here. We can start it running, to see what happens, with this:
>apache -s
            
You might want to automate your Apache startup by putting the necessary line into a file called go.bat. You then only need to type:
               go[RETURN]
Since this is the same as for the Unix version, we will simply say "type go" throughout the book when Apache is to be started, and thus save lengthy explanations.
When we ran Apache, we received the following lines:
Apache/<version number>
Syntax error on line 44 of /apache/conf/httpd.conf
ServerRoot must be a valid directory
To deal with the first complaint, we looked at the file
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Directives
Here we go over the directives again, giving formal definitions for reference.
ServerName gives the hostname of the server to use when creating redirection URLs, that is, if you use a <Location> directive or access a directory without a trailing /.
ServerName hostname 
Server config, virtual host
It will also be useful when we consider Virtual Hosting (see Chapter 4).
This directive sets the directory from which Apache will serve files.
DocumentRoot directory
Default: /usr/local/apache/htdocs
Server config, virtual host
Unless matched by a directive like Alias, the server appends the path from the requested URL to the document root to make the path to the document. For example:
DocumentRoot /usr/web
An access to http://www.www.my.host.com/index.html now refers to /usr/web/index.html.
There appears to be a bug in the relevant Module, mod_dir, that causes problems when the directory specified in DocumentRoot has a trailing slash (e.g., DocumentRoot /usr/web/), so please avoid that. It is worth bearing in mind that the deeper DocumentRoot goes, the longer it takes Apache to check out the directories. For the sake of performance, adopt the British Army's universal motto: KISS (Keep It Simple, Stupid)!
ServerRoot specifies where the subdirectories conf and logs can be found.
ServerRoot directory
Default directory: /usr/local/etc/httpd
Server config
If you start Apache with the -f (file) option, you need to include the ServerRoot directive. On the other hand, if you use the -d (directory) option, as we do, this directive is not needed.
The ErrorLog directive sets the name of the file to which the server will log any errors it encounters.
ErrorLog filename|syslog[:facility] 
Default: ErrorLog logs/error_log
Server config, virtual host
If the filename does not begin with a slash (/), it is assumed to be relative to the server root.
If the filename begins with a pipe (|), it is assumed to be a command to spawn a file to handle the error log.
Apache 1.3 and above: using
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Shared Objects
If you are using the DSO mechanism, you need quite a lot of stuff in your Config file.
In Apache v1.3 the order of these directives is important, so it is probably easiest to generate the list by doing an "out of the box" build using the flag --enable-shared=max. You will find /usr/etc/httpd / httpd.conf.default: copy the list from it into your own Config file, and edit it as you need.
LoadModule env_module         libexec/mod_env.so
LoadModule config_log_module  libexec/mod_log_config.so
LoadModule mime_module        libexec/mod_mime.so
LoadModule negotiation_module libexec/mod_negotiation.so
LoadModule status_module      libexec/mod_status.so
LoadModule includes_module    libexec/mod_include.so
LoadModule autoindex_module   libexec/mod_autoindex.so
LoadModule dir_module         libexec/mod_dir.so
LoadModule cgi_module         libexec/mod_cgi.so
LoadModule asis_module        libexec/mod_asis.so
LoadModule imap_module        libexec/mod_imap.so
LoadModule action_module      libexec/mod_actions.so
LoadModule userdir_module     libexec/mod_userdir.so
LoadModule alias_module       libexec/mod_alias.so
LoadModule access_module      libexec/mod_access.so
LoadModule auth_module        libexec/mod_auth.so
LoadModule setenvif_module    libexec/mod_setenvif.so

#  Reconstruction of the complete module list from all available modules
#  (static and shared ones) to achieve correct module execution order.
#  [WHENEVER YOU CHANGE THE LOADMODULE SECTION ABOVE UPDATE THIS, TOO]
ClearModuleList
AddModule mod_env.c
AddModule mod_log_config.c
AddModule mod_mime.c
AddModule mod_negotiation.c
AddModule mod_status.c
AddModule mod_include.c
AddModule mod_autoindex.c
AddModule mod_dir.c
AddModule mod_cgi.c
AddModule mod_asis.c
AddModule mod_imap.c
AddModule mod_actions.c
AddModule mod_userdir.c
AddModule mod_alias.c
AddModule mod_access.c
AddModule mod_auth.c
AddModule mod_so.c
AddModule mod_setenvif.c
Notice that the list comes in three parts: LoadModules, then ClearModuleList, followed by AddModules to activate the ones you want. As we said earlier, it is all rather cumbersome and easy to get wrong. You might want put the list in a separate file and then
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Toward a Real Web Site
Now that we have the server running with a basic configuration, we can start to explore more sophisticated possibilities in greater detail. Fortunately, the differences between the Windows and Unix versions of Apache fade as we get past the initial setup and configuration, so it's easier to focus on the details of making a web site work.
We are now in a position to start creating real(ish) web sites, which can be found in the sample code at the web site for the book, http://oreilly.com/catalog/apache3/. For the sake of a little extra realism, we will base the site loosely round a simple web business, Butterthlies, Inc., that creates and sells picture postcards. We need to give it some web addresses, but since we don't yet want to venture into the outside world, they should be variants on your own network ID. This way, all the machines in the network realize that they don't have to go out on the Web to make contact. For instance, we edited the \windows\hosts file on the Windows 95 machine running the browser and the /etc/hosts file on the Unix machine running the server to read as follows:
127.0.0.1 localhost
192.168.123.2 www.butterthlies.com
192.168.123.2 sales.butterthlies.com
192.168.123.3 sales-IP.butterthlies.com
192.168.124.1 www.faraway.com
localhost is obligatory, so we left it in, but you should not make any server requests to it since the results are likely to be confusing.
You probably need to consult your network manager to make similar arrangements.
site.simple is site.toddle with a few small changes. The script go will work anywhere. To get started, do the following, depending on your operating environment:
test -d logs || mkdir logs
httpd -d 'pwd' -f 'pwd'/conf/httpd.conf
         
Open an MS-DOS window and from the command line, type:
c>cd \program files\apache group\apache
c>apache -k start
c>Apache/1.3.26 (Win32) running ... 
To stop Apache, open a second MS-DOS window:
c>apache -k stop 
c>cd logs 
c>edit error.log 
This will be true of each site in the demonstration setup, so we will not mention it again.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
More and Better Web Sites: site.simple
We are now in a position to start creating real(ish) web sites, which can be found in the sample code at the web site for the book, http://oreilly.com/catalog/apache3/. For the sake of a little extra realism, we will base the site loosely round a simple web business, Butterthlies, Inc., that creates and sells picture postcards. We need to give it some web addresses, but since we don't yet want to venture into the outside world, they should be variants on your own network ID. This way, all the machines in the network realize that they don't have to go out on the Web to make contact. For instance, we edited the \windows\hosts file on the Windows 95 machine running the browser and the /etc/hosts file on the Unix machine running the server to read as follows:
127.0.0.1 localhost
192.168.123.2 www.butterthlies.com
192.168.123.2 sales.butterthlies.com
192.168.123.3 sales-IP.butterthlies.com
192.168.124.1 www.faraway.com
localhost is obligatory, so we left it in, but you should not make any server requests to it since the results are likely to be confusing.
You probably need to consult your network manager to make similar arrangements.
site.simple is site.toddle with a few small changes. The script go