After setting up a caching proxy on your network, you’ll need to figure out how to make your clients (browsers) use it. For many organizations, this is a particularly daunting task. There are a number of techniques you can use. Your choice is likely to depend on how many users you have, the client software they use, and whether you can configure that software. For example, a corporate information systems department usually supports one or two browsers and has full administrative control over all employee workstations. An ISP’s customers, on the other hand, maintain their own computers and run whatever software they like.
The oldest technique is what we call manual configuration. The clients are given one or more proxy addresses to use, and, with few exceptions, they forward all requests to the proxy. Manual configuration is relatively straightforward, but it’s not something most users can figure out on their own. If naive users are to manually configure their browsers, they require detailed instructions, such as those in the following section. Given the choice, some users won’t bother to configure their browser since it’s somewhat of a hassle. Another drawback is that many user-agents don’t handle failures well when manually configured.
The second-generation configuration technique, pioneered by Netscape, is known as proxy auto-configuration. Instead of a static configuration, the browser executes a JavaScript function shortly before making each request. This script gives the browser a list of proxy addresses to which the request is forwarded. The auto-configuration technique also offers improved failure detection and failover compared to manual configuration. For example, if the browser believes the first proxy in the list is down, then it fails over to the next one, and so on.
Although proxy auto-configuration is more flexible, it also requires the user to enter a URL for the JavaScript file. To eliminate this step, some companies are proposing a web proxy auto-discovery protocol. When the browser starts up, it uses DHCP and the DNS to locate a proxy auto-configuration script. If found, the browser automatically loads the script and begins using the caching proxy.
A recent development, known as interception proxying, delivers client requests to caching proxies with absolutely no configuration of the client. Instead, you configure your network equipment (e.g., routers, switches) to divert HTTP traffic to the proxies. In this case, the client doesn’t even know it’s talking to a proxy—it thinks it’s talking to the origin server. Interception proxying (a.k.a. transparent proxying) is an interesting and controversial technique; we’ll save it for the next chapter.
Finally, this chapter finishes with a few suggestions for configuring browser caches when the browser also utilizes a shared cache.
As with all other Internet services, such as mail and FTP, a proxy server has an address comprised of a hostname (or IP address) and a port number. However, unlike most other services, proxies do not have a standard default port number. While it is generally sufficient to say, “connect to the FTP server at ftp.isp.net,” it is not sufficient to say, “use the proxy at proxy.isp.net.” Thus, proxy addresses always appear with explicit port numbers. Typically, they are written together, separated by a colon, for example:
172.16.4.1:8080 squid.ircache.net:3128 proxy1.bigisp.net:80
Although there is no default port number, most proxies use one of the following ports: 3128, 8080, 80, or 1080. Most of these are some variation of “80” because that is the default for HTTP. Port number 3128 was arbitrarily selected as the default for the Harvest software, from which Squid evolved. Although port 80 is normally associated with origin servers, it is also popular for proxy servers, probably because the CERN server was able to function simultaneously as both an origin server and as a proxy server. Today’s web caches have the ability to listen for requests on multiple ports at the same time. I don’t recommend using port 80 for proxy servers unless absolutely necessary. Usually, we identify network applications by port numbers, and the shared approach makes it difficult to separate origin server traffic from proxy traffic.
Get Web Caching now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.