Published on The O'Reilly Network (http://www.oreillynet.com/)
OSCON Day 3: HTTP Caching
by Robert Kaye
Aug. 6, 2005
Michael Radwin's "HTTP Caching & Cache-Busting for Content Publishers" talk was very much like being smacked with an O'Reilly Book and absorbing its contents via impact-osmosis. A little intense, but very informative. This talk was very similar to Ask Bjørn Hansen's talk -- many slides at lightening pace -- one blink and you would've missed something important.
This talk was aimed at people who need to be aware of HTTP caching -- both for creating web sites with more/better/secure functionality, as well as being aware of caching proxies. Cache proxies are frequently used to prevent duplicate fetching of the same content to reduce network load. For instance, AOL uses cache proxies frequently to reduce their overall bandwidth use by reducing the number of times that a given page gets fetched from the server. Proper cache handling is important to make sure that dynamic web pages aren't tripped up by caches and that caches can be effective, which in turn will reduce the load on your own site.
Michael broke web content into three categories:
- HTML - change frequently and should be considered dynamic content
- Flash, images and PDFs -- change very infrequently, perhaps never
He suggested that each of these types of content should be treated differently when considering caching strategies. He suggested five strategies for dealing with these types of content:
- Images never expire policy: Since a lot of images on a site (e.g. logos) change very infrequently, it is not a stretch to say that images never expire. By setting the cache expire time in the request header to something like 10 years in the future, you can be sure that a caching proxy will not fetch an image from your server unnecessarily. But what happens if you do change the image? Use another filename -- this can be a bit of a pain, but if you're trying to optimize the bandwidth usage of your site, you'll have to jump through plenty of hoops.
- Apache defaults for occasionally changing content: If you're not certain how frequently your content changes, apache does the right thing by default. No extra work required.
- Using URL tags for sensitive content: If you're sending sensitive content over the net and you want to make sure that a caching proxy always does the right thing, consider embedding a user or session id into the URL. If you embed the user ID into the URL, even if the server never bothers to look at it, then a proxy will never mistakenly return a page from the cache to another user. However, if the same user requests the same page over again, the caching proxy can safely return the cached page.
This talk covered many more details that I can't really convey here -- if you're interested in finding out all the details on how to deal with caches, Michael suggested to pick up a book on the topic.
is the Mayhem & Chaos Coordinator and creator of MusicBrainz, the music metadata commons.
oreillynet.com Copyright © 2006 O'Reilly Media, Inc.