O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  


APACHE HACK

Search Engine Friendly SSI Image Gallery
You need to upload and present a large number of images in a very short amount of time. To make matters worse, you don't have access to a database or a programming language, and you'd like the pages to be accessible via a search engine, so the final URLs can't have a "?" in them. Obviously, it's out of the question to create thousands of html files.

Contributed by:
Morbus Iff
[03/14/03 | Discuss (1) | Link to this hack]

Prerequisites

  • The Server Side Include module (mod_includes) must be enabled.
  • .htaccess support, with RedirectMatch capability (optional).

The solution is a mixture of an SSI template file and a user-created index page that refers to said template file. Our SSI template will display the file size and last modification date of the image, as well as display an error message if the file doesn't exist.

This hack revolves around the use of an environment variable called PATH_INFO. Environment variables are passed between the browser and the server, are visitor-independent and work with any browser. PATH_INFO is the string after a complete URL has already been specified - in the example below, "http://www.disobey.com/show_me.shtml" is the complete URL and "/themoney" is contained within PATH_INFO:

{{{
   http://www.disobey.com/show_me.shtml/themoney
}}}

The first step is to create the SSI template file, which we'll name show_me.shtml. This file will be used to display each of our images, one at a time. We start off with:

{{{
   <html>
   <head>
      <title>Apache Hack #12394 - SSI Image Gallery</title>
   </head>
   <body>
     <img src="..<!--#echo var="PATH_INFO"-->.jpg" />
   </body>
   </html>
}}}

With the above template, anything passed through PATH_INFO in the URL is included into the <img> tag before being sent to the browser. Using our "show_me.shtml/themoney" example, the browser would attempt to load a file from "../themoney.jpg".

It makes sense to think that if "themoney.jpg" is in the same directory as "show_me.shtml" than we wouldn't need the "../". This isn't the case - because the added slash of the URL (due to the PATH_INFO) *seems* to create an added hierarchy, we've got to tell our browser to go up one directory.

Next, add some information about the passed image file. To do so, we'll use more SSI to output the filesize and last modification:

{{{
   <html>
   <head>
      <title>Apache Hack #12394 - SSI Image Gallery</title>
   </head>
   <body>
     Image Last Modified: <!--#flastmod virtual="..$PATH_INFO.jpg"-->
     <br />Image File Size: <!--#fsize virtual="..$PATH_INFO.jpg"-->
     <img src="..<!--#echo var="PATH_INFO"-->.jpg" />
   </body>
   </html>
}}}

This example shows how to use SSI to get the "last modified" date of the image file, as well as it's file size. While this information isn't too important for the normal visitor to know about, the side effect of using these SSI commands is.

Since the "last modified" and "file size" commands make the server return info on a file, Apache will let us know when something goes wrong. This "something" could be the file not being there, permission problems, or some other unknown sympton - we can't know for sure without actually looking in our error_log.

We can, however, modify the error message to be a bit friendlier:

{{{
   <html>
   <head>
      <title>Apache Hack #12394 - SSI Image Gallery</title>
   </head>
   <body>
      <!--#config errmsg="This image does not exist!"-->
     Image Last Modified: <!--#flastmod virtual="..$PATH_INFO.jpg"-->
     <br />Image File Size: <!--#fsize virtual="..$PATH_INFO.jpg"-->
     <img src="..<!--#echo var="PATH_INFO"-->.jpg" />
   </body>
   </html>
}}}

By modifying the "errmsg" configuration, any time an image doesn't exist in the directory, the "This image does not exist!" error message will be shown. You can also include HTML in the "errmsg", so it'd be possible to construct a link to a CGI script for reporting purposes (ie. "click here to report this error!").

Now that the template is out of the way, it's a simple matter of creating a normal index page that simply refers to this file:

{{{
   <html>
   <head>
      <title>Apache Hack #12394 - SSI Image Gallery - Index</title>
   </head> 
   <body>
     <ul><li><a href="show_me.shtml/themoney">Show Me What?!</a></li>
     <li><a href="show_me.shtml/themoolah">I Can't Hear You!</a></li></ul>
   </body>
   </html>
}}}

Visitors clicking on the above example will be shown "themoney.jpg" and "themoolah.jpg". If they don't exist, they'll be given an error message.

There are a few caveats to this system:

  • We've forced the image extension in the SSI template to ".jpg", so obviously, all the image files will have to be in JPEG format. This was done because most search engines realize that a ".jpg" extension is an image file, and thus wouldn't index a URL that ended with that file type. This also solves the problem of a browser seeing the ".jpg" extension and trying to display the result as an image, without consulting the server for the correct mime-type.

  • Think "http://www.diosbey.com/show_me.shtml/themoney" looks ugly because it looks like a filename is being used as a directory name? If you have .htaccess control in your web directory, you can add the following to that file: {{{ RedirectMatch /show_me/(.*) ../show_me.shtml/$1 }}} and then refer to the URLs as "http://www.disobey.com/show_me/themoney". Not only does this make the URL nicer looking, it also allows you to change the method of serving the files later on down the line, without changing links or causing 404's.

See also:


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.