O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  


APACHE HACK

Smarter 404s and Other Error Responses
As you check over your error_logs, you can see a lot of missing pages being requested. Where are they coming from? What pages are referring to them? Are they yours, or someone else's? Why? Why? Why!? Your plan is to start dealing with the dreaded 404 swiftly and surely - eliminating every last one of them. With the multiple solutions below, you'll be well on your way.

Contributed by:
Morbus Iff
[03/14/03 | Discuss (1) | Link to this hack]

Prerequisites

  • A default installation of Apache and the ability to modify httpd.conf.
  • Access to create or modify an .htaccess file (optional).
  • Ability to use CGI scripts (optional).
  • Ability to use Server Side Includes (optional).

The error 404 represents many things in a visitor's mind: frustration, disgust, and a hasty exit to one of your competitor's site. While it's certainly possible to create a new site and carefully check to make sure all files resolve, you can't control mispelled links on third party pages. Mispelled links lead to 404's and a disgruntled visitor, whether it's your fault or not. Finally, inevitable redesigns often take no consideration for backwards compatibility with the old file layout.

Note that this solution doesn't tell you how to enable "prettier" 404 (via Apache's ErrorDocument), but rather preventative ways of controlling future 404s.

There are numerous ways to deal with 404s. The simplest is to never let the user know that a 404 has occurred. You can do this with the Redirect and RedirectMatch directives in Apache (usable in the httpd.conf or an .htaccess file):

{{{
   Redirect /> http://www.disobey.com/index.shtml
   Redirect /amphetadeks/ http://www.disobey.com/amphetadesk/
}}}

The above are two Redirects I currently have in place on my own server. The first covers familiar delimiters, like <http://www.disobey.com/> or <URL=http://www.disobey.com/>, often used in emails or text files. Some programs will mistake the closing > as part of the URL, but this Redirect makes sure the user will get to the right place.

The other Redirect covers an embarrassing spelling mistake I once achieved in an email about one of my software projects. Noticed shortly after emailing, I quickly added the Redirect, and the readers never knew the difference.

The second option is to pass 404 reporting off to the user who triggered the error. To do this, we'll use SSI to mail important environment variables off to a standard form mailing CGI. In the example below, we're sending the data off to "mailform.cgi", which will mail everything to "morbus@disobey.com" with an appropriate subject line (the various input names are meanginful to the "mailform.cgi" script - they'll differ depending on what script is used). The mailing will only happen if the user clicks the "report this error" button. Saving this file to "error.shtml", you'd want to modify Apache's ErrorDocument directive to match.

{{{
   <html>
   <head>
     <title>Apache Hack #1237 - Oop! An Error Has Occurred!</title>
   </head>
   <body>

     <h1>Oh No! An Error Has Occurred!</h1>

     <form action="/cgi-bin/mailform.cgi" method="post">
     <input type="hidden" name="recipient" value="morbus@disobey.com" />
     <input type="hidden" name="subject" value="404 Error - Fix Me!" />
     <input type="hidden" name="redirect" value="http://www.disobey.com/" />
     <input type="hidden" name="http_referer" value="<!--#echo var="HTTP_REFERER"-->" />
     <input type="hidden" name="request_uri" value="<!--#echo var="REQUEST_URI"-->" />
     <input type="hidden" name="query_string" value="<!--#echo var="QUERY_STRING"-->" />
     <input type="submit" value="click here to file an error report">
     </form>

   </body>
   &t;/html>
}}}

The downside to the above example is that you're relying on the kindess of a disgruntled user to report the error. If they're being particularly finicky, they'll just move on, and you'll be none the wiser.

SSI's are limited in their scope - there's not much more you can do past simple redirects, requests for more information, or barely suggestive instructions. If you're particularly interested in stopping the 404 problem, you can set your ErrorDocument to a CGI script, like the one below.

In the below example, we'll automatically be emailed for each and every 404 request for any file ending in .html or .mp3. The user will then be redirected to a standard "oops!" page saying that the administrators have been notified, etc.

{{{
   #!/usr/bin/perl -w
   use strict;
   use Mail::Mailer;

   # define all file extensions you
   # want to receive a mail report for.
   my @filetypes = ("mp3", "html");

   # our real 404 page. we redirect to this page
   # whether we send a mail message or not.
   my $error_page = "http://www.disobey.com/oops.html";

   # get the extension from the URI,
   # and then check to see if it's one
   # we should be monitoring (@filetypes).
   $ENV{REQUEST_URI} =~ /\w+\.([^.]+)$/; my $ext = $1;
   my $continue; foreach (@filetypes) { $continue++ if $ext eq $_; }

   # redirect to the error page and quit if we're not reporting.
   unless ($continue) { print "Location: $error_page\n\n"; exit; }

   # create the body of our HTML and mail message.
   my $body = "I've just been giving a healthy does of NOT FOUND!\n";
   $body .= "Here's the environment I was called with:\n\n";
   while ( my ($var, $val) = each %ENV ) { $body .= "  $var = $val\n"; }

   # mail our warning off.
   my $mailer = Mail::Mailer->new("sendmail");
   $mailer->open({ From    => "404\@disobey.com",
                   To      => "morbus\@disobey.com",
                   Subject => "A 404 Has Been Detected!",
                 });
   print $mailer $body;
   $mailer->close();

   # and now redirect to our real error.
   print "Location: $error_page\n\n"; exit;
}}}

The downside to the above script is you'll get a LOT of emails if you have a high traffic site - it'll certainly make you want to fix the errors rather quickly (by putting in Redirects as per our first example, perhaps). The upside of using a CGI script for errors is that you have total control over what goes on - you can show the user a list of "related URLs", construct a link to the Google cache mechanism, or any number of other helpful responses.


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.