Chapter 4. Retrieving Information

To build a successful web application, you often need to know a lot about the environment in which it is running. You may need to find out about the server that is executing your servlets or the specifics of the client that is sending requests. And no matter what kind of environment the application is running in, you most certainly need information about the requests that the application is handling.

A number of methods provide servlets access to this information. For the most part, each method returns one specific result. Compared this to the way environment variables are used to pass a CGI program its information, the servlet approach has several advantages:

Stronger type checking. Servlets get more help from the compiler in catching errors. A CGI program uses one function to retrieve its environment variables. Many errors cannot be found until they cause runtime problems. Let’s look at how a CGI program and a servlet find the port on which its server is running.
A CGI script written in Perl calls:
```
$port = $ENV{'SERVER_PORT'};
```
where $port is an untyped variable. A CGI program written in C calls:
```
char *port = getenv("SERVER_PORT");
```
where port is a pointer to a character string. The chance for accidental errors is high. The environment variable name could be misspelled (it happens often enough) or the datatype might not match what the environment variable returns.
A servlet, on the other hand, calls:
```
int port = req.getServerPort()
```
This eliminates a lot of accidental errors because the compiler can guarantee there are no misspellings and each return type is as it should be.
Delayed calculation. When a server launches a CGI program, the value for each and every environment variable must be precalculated and passed, whether the CGI program uses it or not. A server launching a servlet has the option to improve performance by delaying these calculations and performing them on demand as needed.
More interaction with the server. Once a CGI program begins execution, it is untethered from its server. The only communication path available to the program is its standard output. A servlet, however, can work with the server. As discussed in the previous chapter, a servlet operates either within the server (when possible) or as a connected process outside the server (when necessary). Using this connectivity, a servlet can make ad hoc requests for calculated information that only the server can provide. For example, a servlet can have its server do arbitrary path translations, taking into consideration the server’s aliases and virtual paths.

If you’re coming to servlets from CGI, Table 4-1 is a “cheat sheet” you can use for your migration. It lists each CGI environment variable and the corresponding HTTP servlet method.

Table 4-1. CGI Environment Variables and the Corresponding Servlet Methods

CGI Environment Variable	HTTP Servlet Method
`SERVER_NAME`	`req.getServerName()`
`SERVER_SOFTWARE`	`getServletContext().getServerInfo()`
`SERVER_PROTOCOL`	`req.getProtocol()`
`SERVER_PORT`	`req.getServerPort()`
`REQUEST_METHOD`	`req.getMethod()`
`PATH_INFO`	`req.getPathInfo()`
`PATH_TRANSLATED`	`req.getPathTranslated()`
`SCRIPT_NAME`	`req.getServletPath()`
`DOCUMENT_ROOT`	`getServletContext().getRealPath("/")`
`QUERY_STRING`	`req.getQueryString()`
`REMOTE_HOST`	`req.getRemoteHost()`
`REMOTE_ADDR`	`req.getRemoteAddr()`
`AUTH_TYPE`	`req.getAuthType()`
`REMOTE_USER`	`req.getRemoteUser()`
`CONTENT_TYPE`	`req.getContentType()`
`CONTENT_LENGTH`	`req.getContentLength()`
`HTTP_ACCEPT`	`req.getHeader("Accept")`
`HTTP_USER_AGENT`	`req.getHeader("User-Agent")`
`HTTP_REFERER`	`req.getHeader("Referer")`

In the rest of this chapter, we’ll see how and when to use these methods—and many other methods that have no CGI counterparts. Along the way, we’ll put the methods to use in some real servlets.

The Servlet

Each registered servlet name can have specific initialization (init) parameters associated with it. Init parameters are available to the servlet at any time; they are set in the web.xml deployment descriptor and generally used in init( ) to set initial or default values for a servlet or to customize the servlet’s behavior in some way. Init parameters are more fully explained in Chapter 3.

Getting a Servlet Init Parameter

A servlet uses the getInitParameter( ) method for access to its init parameters:

public String ServletConfig.getInitParameter(String name)

This method returns the value of the named init parameter or null if it does not exist. The return value is always a single String. It is up to the servlet to interpret the value.

The GenericServlet class implements the ServletConfig interface and thus provides direct access to the getInitParameter( ) method. This means the method can be called like this:

public void init() throws ServletException {
  String greeting = getInitParameter("greeting");
}

A servlet that needs to establish a connection to a database can use its init parameters to define the details of the connection. We can assume a custom establishConnection( ) method to abstract away the details of JDBC, as shown in Example 4-1.

Example 4-1. Using init Parameters to Establish a Database Connection

java.sql.Connection con = null;

public void init() throws ServletException {
  String host = getInitParameter("host");
  int port = Integer.parseInt(getInitParameter("port"));
  String db = getInitParameter("db");
  String user = getInitParameter("user");
  String password = getInitParameter("password");
  String proxy = getInitParameter("proxy");

  con = establishConnection(host, port, db, user, password, proxy);
}

There’s also another more advanced and standard abstraction model available to servlets designed for Java 2, Enterprise Edition (J2EE). See Chapter 12.

Getting Servlet Init Parameter Names

A servlet can examine all its init parameters using getInitParameterNames( ) :

public Enumeration ServletConfig.getInitParameterNames()

This method returns the names of all the servlet’s init parameters as an Enumeration of String objects or an empty Enumeration if no parameters exist. It’s most often used for debugging.

The GenericServlet class additionally makes this method directly available to servlets. Example 4-2 shows a servlet that prints the name and value for all of its init parameters.

Example 4-2. Getting init Parameter Names

import java.io.*;
import java.util.*;
import javax.servlet.*;

public class InitSnoop extends GenericServlet {

  // No init() method needed

  public void service(ServletRequest req, ServletResponse res)
                             throws ServletException, IOException {
    res.setContentType("text/plain");
    PrintWriter out = res.getWriter();

    out.println("Init Parameters:");
    Enumeration enum = getInitParameterNames();
    while (enum.hasMoreElements()) {
      String name = (String) enum.nextElement();
      out.println(name + ": " + getInitParameter(name));
    }
  }
}

Notice that this servlet directly subclasses GenericServlet, showing that init parameters are available to servlets that aren’t HTTP servlets. A generic servlet can be used in a web server even though it lacks any support for HTTP-specific functionality.

Getting a Servlet’s Name

Also in the ServletConfig interface there’s a method that returns the servlet’s registered name:

public String ServletConfig.getServletName()

If the servlet is unregistered, the method returns the servlet’s class name. This method proves useful when writing to logs and when storing a servlet instance’s state information into a shared resource such as a database or the servlet’s SessionContext that we’ll learn about shortly.

As an example, the following code demonstrates how to use the servlet’s name to retrieve a value from the servlet context, using the name as part of the lookup key:

public void doGet(HttpServletRequest req, HttpServletResponse res) 
                      throws ServletException, IOException { 
  String name = getServletName();
  ServletContext context = getServletContext(); 
  Object value = context.getAttribute(name + ".state"); 
}

Using the servlet name in the key, each servlet instance can easily keep a separate attribute value within the shared context.

The Server

A servlet can find out much about the server in which it is executing. It can learn the hostname, listening port, and server software, among other things. A servlet can display this information to a client, use it to customize its behavior based on a particular server package, or even use it to explicitly restrict the machines on which the servlet will run.

Getting Information About the Server

A servlet gains most of its access to server information through the ServletContext object in which it executes. Before API 2.2, the ServletContext was generally thought of as a reference to the server itself. Since API 2.2 the rules have changed and there now must be a different ServletContext for each web application on the server. The ServletContext has become a reference to the web application, not a reference to the server. For simple server queries, there’s not much difference.

There are five methods that a servlet can use to learn about its server: two that are called using the ServletRequest object passed to the servlet and three that are called from the ServletContext object in which the servlet is executing.

A servlet can get the name of the server and the port number for a particular request with getServerName( ) and getServerPort( ), respectively:

public String ServletRequest.getServerName()
public int ServletRequest.getServerPort()

These methods are attributes of ServletRequest because the values can change for different requests if the server has more than one name (a technique called virtual hosting). The returned name might be something like www.servlets.com while the returned port might be something like 8080.

The getServerInfo( ) and getAttribute( ) methods of ServletContext provide information about the server software and its attributes:

public String ServletContext.getServerInfo()
public Object ServletContext.getAttribute(String name)

getServerInfo( ) returns the name and version of the server software, separated by a slash. The string returned might be something like Tomcat Web Server/3.2. Some servers add extra information at the end describing the server operating environment.

getAttribute( ) returns the value of the named server attribute as an Object or null if the attribute does not exist. Servers have the option to place hardcoded attributes in the context for use by servlets. You can think of this method as a back door through which a servlet can get extra information about its server. For example, a server could make available statistics on server load, a reference to a shared resource pool, or any other potentially useful information. The only mandatory attribute a server must make available is an attribute named javax.servlet.context.tempdir, which provides a java.io.File reference to a directory private to this context.

Servlets can also add their own attributes to the context using the setAttribute( ) method as discussed in Chapter 11. Attribute names should follow the same convention as package names. The package names java.* and javax.* are reserved for use by the Java Software division of Sun Microsystems, and com.sun.* is reserved for use by Sun Microsystems. You can see your server’s documentation for a list of its attributes. A listing of all current attributes stored by the server and other servlets can be obtained using getAttributeNames( ) :

public Enumeration ServletContext.getAttributeNames()

Because these methods are attributes of the ServletContext in which the servlet is executing, you have to call them through that object:

String serverInfo = getServletContext().getServerInfo();

The most straightforward use of information about the server is an “About This Server” servlet, as shown in Example 4-3.

Example 4-3. Snooping the Server

import java.io.*;
import java.util.*;
import javax.servlet.*;

public class ServerSnoop extends GenericServlet {

  public void service(ServletRequest req, ServletResponse res)
                             throws ServletException, IOException {
    res.setContentType("text/plain");
    PrintWriter out = res.getWriter();

    ServletContext context = getServletContext();
    out.println("req.getServerName(): " + req.getServerName());
    out.println("req.getServerPort(): " + req.getServerPort());
    out.println("context.getServerInfo(): " + context.getServerInfo());
    out.println("getServerInfo() name: " +
                 getServerInfoName(context.getServerInfo()));
    out.println("getServerInfo() version: " +
                 getServerInfoVersion(context.getServerInfo()));
    out.println("context.getAttributeNames():");
    Enumeration enum = context.getAttributeNames();
    while (enum.hasMoreElements()) {
      String name = (String) enum.nextElement();
      out.println("  context.getAttribute(\"" + name + "\"): " +
                     context.getAttribute(name));
    }
  }

  private String getServerInfoName(String serverInfo) {
    int slash = serverInfo.indexOf('/');
    if (slash == -1) return serverInfo;
    else return serverInfo.substring(0, slash);
  }

  private String getServerInfoVersion(String serverInfo) {
    // Version info is everything between the slash and the space
    int slash = serverInfo.indexOf('/');
    if (slash == -1) return null;
    int space = serverInfo.indexOf(' ', slash);
    if (space == -1) space = serverInfo.length();
    return serverInfo.substring(slash + 1, space);
  }
}

This servlet also directly subclasses GenericServlet, demonstrating that all the information about a server is available to servlets of any type. The servlet outputs simple raw text. When accessed, this servlet prints something like:

req.getServerName(): localhost
req.getServerPort(): 8080
context.getServerInfo(): Tomcat Web Server/3.2 (JSP 1.1; Servlet 2.2; ...)
getServerInfo() name: Tomcat Web Server
getServerInfo() version: 3.2
context.getAttributeNames():
  context.getAttribute("javax.servlet.context.tempdir"): work/localhost_8080

Writing to a Temporary File

The javax.servlet.context.tempdir attribute maps to a temporary directory where short-lived working files can be stored. Each context receives a different temporary directory. For the previous example, the directory is server_root/work/localhost_8080. Example 4-4 shows how to write to a temporary file in the temporary directory.

Example 4-4. Creating a Temporary File in a Temporary Directory

public void doGet(HttpServletRequest req, HttpServletResponse res)
                      throws ServletException, IOException {
  // The directory is given as a File object
  File dir = (File) getServletContext()
                    .getAttribute("javax.servlet.context.tempdir");

  // Construct a temp file in the temp dir (JDK 1.2 method)
  File f = File.createTempFile("xxx", ".tmp", dir);

  // Prepare to write to the file
  FileOutputStream fout = new FileOutputStream(f);

  // ...
}

First, this servlet locates its temporary directory. Then, it uses the createTempFile( ) method to create a temporary file in that directory with an xxx prefix and .tmp suffix. Finally, it constructs a FileOutputStream to write to the temporary file. Files that must persist between server restarts should not be placed in the temporary directory.

Locking a Servlet to a Server

There are many ways to put this server information to productive use. Let’s assume you’ve written a servlet and you don’t want it running just anywhere. Perhaps you want to sell it and, to limit the chance of unauthorized copying, you want to lock the servlet to your customer’s machine with a software license. Or, alternatively, you’ve written a license generator as a servlet and want to make sure it works only behind your firewall. This can be done relatively easily because a servlet has instant access to the information about its server.

Example 4-5 shows a servlet that locks itself to a particular server IP address and port number. It requires an init parameter key that is appropriate for its server IP address and port before it unlocks itself and handles a request. If it does not receive the appropriate key, it refuses to continue. The algorithm used to map the key to the IP address and port (and vice versa) must be secure.

Example 4-5. A Servlet Locked to a Server

import java.io.*;
import java.net.*;
import java.util.*;
import javax.servlet.*;

public class KeyedServerLock extends GenericServlet {

  // This servlet has no class or instance variables
  // associated with the locking, so as to simplify
  // synchronization issues.

  public void service(ServletRequest req, ServletResponse res)
                             throws ServletException, IOException {
    res.setContentType("text/plain");
    PrintWriter out = res.getWriter();

    // The piracy check shouldn't be done in init
    // because name/port are part of request.
    String key = getInitParameter("key");
    String host = req.getServerName();
    int port = req.getServerPort();

    // Check if the init parameter "key" unlocks this server.
    if (! keyFitsServer(key, host, port)) {
      // Explain, condemn, threaten, etc.
      out.println("Pirated!");
    }
    else {
      // Give 'em the goods
      out.println("Valid");
      // etc...
    }
  }

  // This method contains the algorithm used to match a key with
  // a server host and port. This example implementation is extremely
  // weak and should not be used by commercial sites.
  //
  private boolean keyFitsServer(String key, String host, int port) {

    if (key == null) return false;

    long numericKey = 0;
    try {
      numericKey = Long.parseLong(key);
    }
    catch (NumberFormatException e) {
      return false;
    }

    // The key must be a 64-bit number equal to the logical not (~)
    // of the 32-bit IP address concatenated with the 32-bit port number.

    byte hostIP[];
    try {
      hostIP = InetAddress.getByName(host).getAddress();
    }
    catch (UnknownHostException e) {
      return false;
    }

    // Get the 32-bit IP address
    long servercode = 0;
    for (int i = 0; i < 4; i++) {
      servercode <<= 8;
      servercode |= hostIP[i];
    }

    // Concatentate the 32-bit port number
    servercode <<= 32;
    servercode |= port;

    // Logical not
    long accesscode = ~numericKey;

    // The moment of truth: Does the key match?
    return (servercode == accesscode);
  }
}

This servlet refuses to perform unless given the correct key. To really make it secure, however, the simple keyFitsServer( ) logic should be replaced with a strong algorithm and the whole servlet should be run through an obfuscator to prevent decompiling. Example 4-13 later in this chapter provides the code used to generate keys. If you try this servlet yourself, it’s best if you access the server with its actual name, rather than localhost , so the servlet can determine the web server’s true name and IP address.

Getting a Context Init Parameter

Servlet init parameters, as discussed earlier, are passed to individual servlets. When multiple servlets should receive the same init parameter values, those values may be assigned as a context init parameter. The ServletContext class has two methods—getInitParameter( ) and getInitParameterNames( )—for retrieving contextwide initialization information:

public String ServletContext.getInitParameter(String name)
public Enumeration ServletContext.getInitParameterNames()

These methods are modeled after their counterparts in ServletConfig. getInitParameter(String name) returns the string value of the specified parameter. getInitParameterNames( ) returns an Enumeration containing the names of all the init parameters available to the web application, or an empty Enumeration if there were none.

The init parameters for a context are specified in the web.xml deployment descriptor for the context using the <context-param> tag as shown in Example 4-6.

Example 4-6. Setting Context Init Parameters in the Deployment Descriptor

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE web-app
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
    "http://java.sun.com/j2ee/dtds/web-app_2.2.dtd">

<web-app>
    <context-param>
        <param-name>
            rmihost
        </param-name>
        <param-value>
            localhost
        </param-value>
    </context-param>
    <context-param>
        <param-name>
            rmiport
        </param-name>
        <param-value>
            1099
        </param-value>
    </context-param>
</web-app>

Any servlet within this web application can read the context init parameters to locate the shared RMI registry, as shown in Example 4-7.

Example 4-7. Finding the Registry Using Context Init Parameters

import java.io.*;
import java.rmi.registry.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class RmiDemo extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                    throws ServletException, IOException {
    res.setContentType("text/plain");
    PrintWriter out = res.getWriter();

    try {
      ServletContext context = getServletContext();
      String rmihost = context.getInitParameter("rmihost");
      int rmiport = Integer.parseInt(context.getInitParameter("rmiport"));

      Registry registry = LocateRegistry.getRegistry(rmihost, rmiport);
      // ...
    }
    catch (Exception e) {
      // ...
    }
  }
}

There’s no standard mechanism to create global init parameters visible across all contexts.

Determining the Servlet Version

A servlet can also ask the server what Servlet API version the server supports. Besides being useful for debugging, a servlet can use this information to decide whether to use a new approach to solve a task or an older, perhaps less efficient, approach. Servlet API 2.1 introduced two methods to return the version information:

public int ServletContext.getMajorVersion()
public int ServletContext.getMinorVersion()

For API 2.1, getMajorVersion( ) returns 2 and getMinorVersion( ) returns 1. Of course, these methods work only for servlets executing in servers supporting Servlet API 2.1 and later. To determine the current Servlet API version across all servers, you can use com.oreilly.servlet.VersionDetector. This class doesn’t ask the server for the version; instead, it looks at the classes and variables available in the runtime and based on knowledge of the Servlet API history can calculate the current version, from 1.0 to 2.3. Because the class doesn’t call getMajorVersion( ) or getMinorVersion( ), it not only works across all versions of the API, but also compiles across all versions. The VersionDetector class also can determine the current JDK version, from 1.0 to 1.3, using the same technique. This turns out to be more reliable across JVM vendor implementations than querying the System class. Example 4-8 shows the VersionDetector class. Updates to the class to support future Servlet API and JDK versions will be posted to http://www.servlets.com.

Example 4-8. The VersionDetector Class

package com.oreilly.servlet;

public class VersionDetector {

  static String servletVersion;
  static String javaVersion;

  public static String getServletVersion() {
    if (servletVersion != null) {
      return servletVersion;
    }

    // javax.servlet.http.HttpSession was introduced in Servlet API 2.0
    // javax.servlet.RequestDispatcher was introduced in Servlet API 2.1
    // javax.servlet.http.HttpServletResponse.SC_EXPECTATION_FAILED was
    //   introduced in Servlet API 2.2
    // javax.servlet.Filter is slated to be introduced in Servlet API 2.3
    String ver = null;
    try {
      ver = "1.0";
      Class.forName("javax.servlet.http.HttpSession");
      ver = "2.0";
      Class.forName("javax.servlet.RequestDispatcher");
      ver = "2.1"; 
      Class.forName("javax.servlet.http.HttpServletResponse")
                   .getDeclaredField("SC_EXPECTATION_FAILED");
      ver = "2.2";
      Class.forName("javax.servlet.Filter");
      ver = "2.3"; 
    }
    catch (Throwable t) {
    }
    
    servletVersion = ver;
    return servletVersion;
  }

  public static String getJavaVersion() {
    if (javaVersion != null) {
      return javaVersion;
    }

    // java.lang.Void was introduced in JDK 1.1
    // java.lang.ThreadLocal was introduced in JDK 1.2
    // java.lang.StrictMath was introduced in JDK 1.3
    String ver = null;
    try {
      ver = "1.0";
      Class.forName("java.lang.Void");
      ver = "1.1";
      Class.forName("java.lang.ThreadLocal");
      ver = "1.2"; 
      Class.forName("java.lang.StrictMath");
      ver = "1.3"; 
    }
    catch (Throwable t) {
    }

    javaVersion = ver;
    return javaVersion;
  }
}

The class works by attempting to load classes and access variables until a NoClassDefFoundError or NoSuchFieldException halts the search. At that point, the current version is known. Example 4-9 demonstrates a servlet that snoops the servlet and Java version.

Example 4-9. Snooping Versions

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;

import com.oreilly.servlet.VersionDetector;

public class VersionSnoop extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                        throws ServletException, IOException {
    res.setContentType("text/plain");
    PrintWriter out = res.getWriter();

    out.println("Servlet Version: " + VersionDetector.getServletVersion());
    out.println("Java Version: " + VersionDetector.getJavaVersion());
  }
}

The Client

For each request, a servlet has the ability to find out about the client machine and, for pages requiring authentication, about the actual user. This information can be used for logging access data, associating information with individual users, or restricting access to certain clients.

Getting Information About the Client Machine

A servlet can use getRemoteAddr( ) and getRemoteHost( ) to retrieve the IP address and hostname of the client machine, respectively:

public String ServletRequest.getRemoteAddr() 
public String ServletRequest.getRemoteHost()

Both values are returned as String objects. The information comes from the socket that connects the server to the client, so the remote address and hostname may be that of a proxy server. An example remote address might be 192.26.80.118 while an example remote host might be dist.engr.sgi.com.

The IP address or remote hostname can be converted to a java.net.InetAddress object using InetAddress.getByName( ) :

InetAddress remoteInetAddress = InetAddress.getByName(req.getRemoteAddr());

Restricting Access

Due to the United States government’s policy restricting the export of strong encryption, some web sites must be careful about who they let download certain software. Servlets, with their ability to find out about the client machine, are well suited to enforce this restriction. These servlets can check the client machine and provide links for download only if the client appears to be coming from a permitted country.

In the first edition of this book, permitted countries were only the United States and Canada, and this servlet was written to allow downloads only for users from these two countries. In the time since that edition, the United States government has loosened its policy on exporting strong encryption, and now most encryption software can be downloaded by anyone except those from the “Terrorist 7” countries of Cuba, Iran, Iraq, North Korea, Libya, Syria, and Sudan. Example 4-10 shows a servlet that permits downloads from anyone outside these seven countries.

Example 4-10. Can They Be Trusted?

import java.io.*;
import java.net.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class ExportRestriction extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    res.setContentType("text/html");
    PrintWriter out = res.getWriter();

    // ...Some introductory HTML...

    // Get the client's hostname
    String remoteHost = req.getRemoteHost();

    // See if the client is allowed
    if (! isHostAllowed(remoteHost)) {
      out.println("Access <BLINK>denied</BLINK>");
    }
    else {
      out.println("Access granted");
      // Display download links, etc...
    }
  }

  // Disallow hosts ending with .cu, .ir, .iq, .kp, .ly, .sy, and .sd.
  private boolean isHostAllowed(String host) {
    return (!host.endsWith(".cu") &&
            !host.endsWith(".ir") &&
            !host.endsWith(".iq") &&
            !host.endsWith(".kp") &&
            !host.endsWith(".ly") &&
            !host.endsWith(".sy") &&
            !host.endsWith(".sd"));
  }
}

This servlet gets the client hostname with a call to req.getRemoteHost( ) and, based on its suffix, decides if the client came from any of the denied countries. Of course, be sure to get high-priced legal counsel before making any cryptographic code available for download.

Getting Information About the User

What do you do when you need to restrict access to some of your web pages but want to have a bit more control over the restriction than this country-by-country approach? Say, for example, you publish an online magazine and want only paid subscribers to read the articles. Well (prepare yourself), you don’t need servlets to do this.

Nearly every HTTP server has a built-in capability to restrict access to some or all of its pages to a given set of registered users. How you set up restricted access is covered in Chapter 8, but here’s how it works mechanically. The first time a browser attempts to access one of these pages, the HTTP server replies that it needs special user authentication. When the browser receives this response, it usually pops open a window asking the user for a name and password appropriate for the page, as shown in Figure 4-1.

Figure 4-1. Login window for restricted page

Once the user enters his information, the browser again attempts to access the page, this time attaching the user’s name and password along with the request. If the server accepts the name/password pair, it happily handles the request. If, on the other hand, the server doesn’t accept the name/password pair, the browser is again denied and the user swears under his breath about forgetting yet another password.

How does this involve servlets? When access to a servlet has been restricted by the server, the servlet can get the name of the user that was accepted by the server, using the getRemoteUser( ) method:

public String HttpServletRequest.getRemoteUser()

Note that this information is retrieved from the servlet’s HttpServletRequest object, the HTTP-specific subclass of ServletRequest. This method returns the name of the user making the request as a String or null if access to the servlet was not restricted. There is no comparable method to get the remote user’s password (although it can be manually determined, as shown in Example 8-2 in Chapter 8). An example remote user might be jhunter.

A servlet can also use the getAuthType( ) method to find out what type of authorization was used:

public String HttpServletRequest.getAuthType()

This method returns the type of authentication used or null if access to the servlet was not restricted. The types may be BASIC, DIGEST, FORM, or CLIENT-CERT. See Chapter 8 for more information on the various types of authentication.

By the time the servlet calls getRemoteUser( ), the server has already determined that the user is authorized to invoke the servlet, but that doesn’t mean the remote user’s name is worthless. The servlet could perform a second authorization check, more restrictive and dynamic than the server’s. For example, it could return sensitive information about someone only if that person made the request, or it could enforce a rule that each user can make only 10 accesses per day.^[16]

Then again, the client’s name can simply tell the servlet who is accessing it. After all, the remote host is not necessarily unique to one user. Unix servers often host hundreds of users, and gateway proxies can act on behalf of thousands. But bear in mind that access to the client’s name comes with a price. Every user must be registered with your server and, before accessing your site, must enter her name and password. Generally speaking, authentication should not be used just so a servlet can know to whom it is talking. Chapter 7 describes some better, lower-maintenance techniques for knowing about users. However, if a servlet is already protected and has the name easily available, the servlet might as well use it.

With the remote user’s name, a servlet can save information about each client. Over the long term, it can remember each individual’s preferences. For the short term, it can remember the series of pages viewed by the client and use them to add a sense of state to a stateless HTTP protocol. The session-tracking tricks from Chapter 7 may be unnecessary if the servlet already knows the name of the client user.

A Personalized Welcome

A simple servlet that uses getRemoteUser( ) can greet its clients by name and remember when each last logged in, as shown in Example 4-11.

Example 4-11. Hey, I Remember You!

import java.io.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class PersonalizedWelcome extends HttpServlet {

  Hashtable accesses = new Hashtable();

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    res.setContentType("text/html");
    PrintWriter out = res.getWriter();

    // ...Some introductory HTML...

    String remoteUser = req.getRemoteUser();

    if (remoteUser == null) {
      out.println("Welcome!");
    }
    else {
      out.println("Welcome, " + remoteUser + "!");
      Date lastAccess = (Date) accesses.get(remoteUser);
      if (lastAccess == null) {
        out.println("This is your first visit!");
      }
      else {
        out.println("Your last visit was " + accesses.get(remoteUser));
      }

      if (remoteUser.equals("PROFESSOR FALKEN")) {
        out.println("Shall we play a game?");
      }

      accesses.put(remoteUser, new Date());
    }

    // ...Continue handling the request...
  }
}

This servlet uses a Hashtable to save the last access time for each remote user. The first thing it does for each request is greet the person by name and tell him the time of his last visit. Then it records the time of this visit, for use next time. After that, it continues handling the request.

The Request

We’ve seen how the servlet finds out about the server and about the client. Now it’s time to move on to the really important stuff: how a servlet finds out what the client wants.

Request Parameters

Each access to a servlet can have any number of request parameters associated with it. These parameters are typically name/value pairs that tell the servlet any extra information it needs to handle the request. Please don’t confuse these request parameters with servlet init parameters, which are associated with the servlet itself.

An HTTP servlet gets its request parameters as part of its query string (for GET requests) or as encoded POST data (for POST requests), or sometimes both. Fortunately, every servlet retrieves its parameters the same way, using getParameter( ) and getParameterValues( ):

public String ServletRequest.getParameter(String name)
public String[] ServletRequest.getParameterValues(String name)

getParameter( ) returns the value of the named parameter as a String or null if the parameter was not specified.^[17] The value is guaranteed to be in its normal, decoded form. If there’s any chance a parameter could have more than one value, you should use the getParameterValues( ) method instead. This method returns all the values of the named parameter as an array of String objects or null if the parameter was not specified. A single value is returned in an array of length 1. If you call getParameter( ) on a parameter with multiple values, the value returned is the same as the first value returned by getParameterValues( ).

One word of warning: if the parameter information came in as encoded POST data, it will not be available if the POST data has already been read manually using the getReader( ) or getInputStream( ) method of ServletRequest (because POST data can be read only once).

The possible uses for request parameters are unlimited. They are a general-purpose way to tell a servlet what to do, how to do it, or both. For a simple example, let’s look at how a dictionary servlet might use getParameter( ) to find out the word it needs to look up.

An HTML file could contain this form asking the user for a word to look up:

<FORM METHOD=GET ACTION="/servlet/Dictionary">
Word to look up: <INPUT TYPE=TEXT NAME="word"><P>
<INPUT TYPE=SUBMIT><P>
</FORM>

The following code retrieves the word parameter:

String word = req.getParameter("word");
String definition = getDefinition(word);
out.println(word + ": " + definition);

This code handles only one value per parameter. Some parameters have multiple values, such as when using <SELECT>:

<FORM METHOD=POST ACTION="/servlet/CarOrder">
Please select the Honda S2000 features you would like:<BR>
<SELECT NAME="features" MULTIPLE>
<OPTION VALUE="aero"> Aero Screen </OPTION>
<OPTION VALUE="cd"> CD Changer </OPTION>
<OPTION VALUE="spoiler"> Trunk Spoiler </OPTION>
</SELECT><BR>
<INPUT TYPE=SUBMIT VALUE="Add to shopping cart">
</FORM>

A servlet can use the getParameterValues( ) method to handle this form:

String[] words = req.getParameterValues("features");
if (features != null) {
  for (int i = 0; i < features.length; i++) {
    cart.add(features[i]);
  }
}

In addition to getting parameter values, a servlet can access parameter names using getParameterNames( ) :

public Enumeration ServletRequest.getParameterNames()

This method returns all the parameter names as an Enumeration of String object or an empty Enumeration if the servlet has no parameters. The method is most often used for debugging. The order of names will not necessarily match the order in the form.

Finally, a servlet can retrieve the raw query string of the request with getQueryString( ):

public String ServletRequest.getQueryString()

This method returns the raw query string (encoded GET parameter information) of the request or null if there was no query string. This low-level information is rarely useful for handling form data. It’s best for handling a single unnamed value, as in /servlet/Sqrt?576, where the returned query string is 576.

Example 4-12 shows the use of these methods with a servlet that prints its query string, then prints the name and value for all its parameters.

Example 4-12. Snooping Parameters

import java.io.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class ParameterSnoop extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    res.setContentType("text/plain");
    PrintWriter out = res.getWriter();

    out.println("Query String:");
    out.println(req.getQueryString());
    out.println();

    out.println("Request Parameters:");
    Enumeration enum = req.getParameterNames();
    while (enum.hasMoreElements()) {
      String name = (String) enum.nextElement();
      String values[] = req.getParameterValues(name);
      if (values != null) {
        for (int i = 0; i < values.length; i++) {
          out.println(name + " (" + i + "): " + values[i]);
        }
      }
    }
  }
}

This servlet’s output is shown in Figure 4-2.

Figure 4-2. The snooped parameters

Beginning with Servlet API 2.2, you can create a POST form with an action URL that contains a query string. When you do this, the aggregated parameter information will be made available via the getParameter( ) methods, with query string parameter values coming before POST values in the case of name collisions. For example, if a request has a query string of a=hokey and POST data of a=pokey, the req.getParameterValues("a") method would return the array { "hokey", "pokey" }.

Generating a License Key

Now we’re ready to write a servlet that generates a KeyedServerLock license key for any given host and port number. A key from this servlet can be used to unlock the KeyedServerLock servlet. So, how will this servlet know the host and port number of the servlet it needs to unlock? Why, with request parameters, of course. Example 4-13 shows the code.

Example 4-13. Unlocking KeyedServerLock

import java.io.*;
import java.net.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class KeyedServerUnlock extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    PrintWriter out = res.getWriter();

    // Get the host and port
    String host = req.getParameter("host");
    String port = req.getParameter("port");

    // If no host, use the current host
    if (host == null) {
      host = req.getServerName();
    }

    // Convert the port to an integer, if none use current port
    int numericPort;
    try {
      numericPort = Integer.parseInt(port);
    }
    catch (NumberFormatException e) {
      numericPort = req.getServerPort();
    }

    // Generate and print the key
    // Any KeyGenerationException is caught and displayed
    try {
      long key = generateKey(host, numericPort);
      out.println(host + ":" + numericPort + " has the key " + key);
    }
    catch (KeyGenerationException e) {
      out.println("Could not generate key: " + e.getMessage());
    }
  }

  // This method contains the algorithm used to match a key with
  // a server host and port. This example implementation is extremely
  // weak and should not be used by commercial sites.
  //
  // Throws a KeyGenerationException because anything more specific
  // would be tied to the chosen algorithm.
  //
  private long generateKey(String host, int port) throws KeyGenerationException {

    // The key must be a 64-bit number equal to the logical not (~)
    // of the 32-bit IP address concatenated by the 32-bit port number.

    byte hostIP[];
    try {
      hostIP = InetAddress.getByName(host).getAddress();

    }
    catch (UnknownHostException e) {
      throw new KeyGenerationException(e.getMessage());
    }

    // Get the 32-bit IP address
    long servercode = 0;
    for (int i = 0; i < 4; i++) {
      servercode <<= 8;
      servercode |= hostIP[i];
    }

    // Concatentate the 32-bit port number
    servercode <<= 32;
    servercode |= port;

    // The key is the logical not
    return ~servercode;
  }
}

class KeyGenerationException extends Exception {

  public KeyGenerationException() {
    super();
  }

  public KeyGenerationException(String msg) {
    super(msg);
  }
}

You can use the output from this servlet to assign KeyedServerLock a special “key” instance variable:

<servlet>
    <servlet-name>
      ksl
    </servlet-name>
    <servlet-class>
      KeyedServerLock
    </servlet-class>
    <init-param>
      <param-name>
        key
      </param-name>
      <param-value>
        -9151314447111823249
      </param-value>
    </init-param>
  </servlet>

Remember to change the generateKey( ) logic before any real use.

Path Information

In addition to parameters, an HTTP request can include something called extra path information or a virtual path . Commonly, this extra path information is used to indicate a file on the server that the servlet should use for something. This path information is encoded in the URL of an HTTP request. An example URL looks like this:

http://server:port/servlet/ViewFile/index.html

This invokes the ViewFile servlet, passing /index.html as extra path information. A servlet can access this path information, and also translate the /index.html string into the real path of the index.html file. What is the real path of /index.html? It’s the full file-system path to the file—what the server would return if the client asked for /index.html directly. This probably turns out to be document_root/index.html, but, of course, the server could have special aliasing that changes this.

Besides being specified explicitly in a URL, this extra path information can also be encoded in the ACTION parameter of an HTML form:

<FORM METHOD=GET ACTION="/servlet/Dictionary/dict/definitions.txt">
Word to look up: <INPUT TYPE=TEXT NAME="word"><P>
<INPUT TYPE=SUBMIT><P>
</FORM>

This form invokes the Dictionary servlet to handle its submissions and passes the Dictionary the extra path information /dict/definitions.txt. The Dictionary servlet can then know to look up word definitions using the definitions.txt file, the same file the client would see if it requested /dict/definitions.txt, on Tomcat probably server_root/webapps/ROOT/dict/definitions.txt.

Why Extra Path Information?

Why does HTTP have special support for extra path information? Isn’t it enough to pass the servlet a path parameter? The answer is yes. Servlets don’t need the special support, but CGI programs do.

A CGI program cannot interact with its server during execution, so it has no way to receive a path parameter, let alone ask the server to map it to a real filesystem location. The server has to somehow translate the path before invoking the CGI program. This is why there needs to be support for special “extra path information.” Servers know to pretranslate this extra path and send the translation to the CGI program as an environment variable. It’s a fairly elegant workaround to a shortcoming in CGI.

Of course, just because servlets don’t need the special handling of extra path information, it doesn’t mean they shouldn’t use it. It provides a simple, convenient way to attach a path along with a request.

Getting path information

A servlet can use the getPathInfo( ) method to get extra path information:

public String HttpServletRequest.getPathInfo()

This method returns the extra path information associated with the request (URL decoded if necessary) or null if none was given. An example path is /dict/definitions.txt. The path information by itself, however, is only marginally useful. A servlet usually needs to know the actual filesystem location of the file given in the path information, which is where getPathTranslated( ) comes in:

public String HttpServletRequest.getPathTranslated()

This method returns the extra path information translated to a real filesystem path (URL decoded if necessary) or null if there is no extra path information. The method also returns null if the path could not be translated to a reasonable file path, such as when the web application is executing from a WAR archive, a remote filesystem not available locally, or a database. The returned path does not necessarily point to an existing file or directory. An example translated path is C:\tomcat\webapps\ROOT\dict\definitions.txt.

Example 4-14 shows a servlet that uses these two methods to print the extra path information it receives and the resulting translation to a real path.

Example 4-14. Showing Where the Path Leads

import java.io.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class FileLocation extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    res.setContentType("text/plain");
    PrintWriter out = res.getWriter();

    if (req.getPathInfo() != null) {
      out.println("The file \"" + req.getPathInfo() + "\"");
      out.println("Is stored at \"" + req.getPathTranslated() + "\"");
    }
    else {
      out.println("Path info is null, no file to lookup");
    }
  }
}

Some example output of this servlet might be:

The file "/index.html"
Is stored at "/usr/local/tomcat/webapps/ROOT/index.html"

Ad hoc path translations

Sometimes a servlet needs to translate a path that wasn’t passed in as extra path information. You can use the getRealPath( ) method for this task:

public String ServletContext.getRealPath(String path)

This method returns the real path of any given virtual path or null if the translation cannot be performed. If the given path is /, the method returns the document root (the place where documents are stored) for the server. If the given path is getPathInfo( ), the method returns the same real path as would be returned by getPathTranslated( ). This method can be used by generic servlets as well as HTTP servlets. There is no CGI counterpart.

Getting the context path

As we learned in Chapter 2, web applications are mapped to URI prefixes on the server. A servlet can determine the URI prefix of the context in which it’s running using the getContextPath( ) method in ServletRequest:

public String ServletRequest.getContextPath()

This method returns a String representing the URI prefix of the context handling the request. The value starts with /, has no ending /, and, for the default context, is empty. For a request to /catalog/books/servlet/BuyNow, for example, the getContextPath( ) would return /catalog/books.

You can use this method to help ensure your servlets will work regardless of what context path they’re mapped to. For example, when generating a link to the home page for a context, you should refrain from hardcoding the context path and use generic code instead:

out.println("<a href=\"" + req.getContextPath() + "/index.html\">Home</a>");

Getting MIME types

Once a servlet has the path to a file, it often needs to discover the type of the file. Use getMimeType( ) to do this:

public String ServletContext.getMimeType(String file)

This method returns the MIME type of the given file, based on its extension, or null if it isn’t known. Common MIME types are text/html, text/plain, image/gif, and image/jpeg. The following code fragment finds the MIME type of the extra path information:

String type = getServletContext().getMimeType(req.getPathTranslated())

Servers generally have knowledge about a core set of file-extension-to-mime-type mappings. These can be enhanced or overridden by entries in the web.xml deployment descriptor, giving each context a configurable behavior, as in Example 4-15.

Example 4-15. Everyone Loves a Mime

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE web-app
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
    "http://java.sun.com/j2ee/dtds/web-app_2.2.dtd">

<web-app>
    <!-- ..... -->
    <mime-mapping>
        <extension>
            java
        </extension>
        <mime-type>
            text/plain
        </mime-type>
    </mime-mapping>
    <mime-mapping>
        <extension>
            cpp
        </extension>
        <mime-type>
            text/plain
        </mime-type>
    </mime-mapping>
</web-app>

Serving Files

The Tomcat Server itself uses servlets to handle every request. Besides being a showcase for the ability of servlets, this gives the server a modular design that allows the wholesale replacement of certain aspects of its functionality. For example, all files are served by the org.apache.tomcat.core.DefaultServlet servlet, charged with the responsibility to handle the / path (meaning it’s the default handler for requests). But there’s nothing to say that the DefaultServlet cannot be replaced. In fact, it can be by changing the / URL pattern to use another servlet. Furthermore, it’s not all that hard to write a primitive replacement for the DefaultServlet, using the methods we’ve just seen.

Example 4-16 shows a ViewFile servlet that uses the getPathTranslated( ) and getMimeType( ) methods to return the file is given by the extra path information.

Example 4-16. Dynamically Returning Static Files

import java.io.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

import com.oreilly.servlet.ServletUtils;

public class ViewFile extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    // Use a ServletOutputStream because we may pass binary information
    ServletOutputStream out = res.getOutputStream();

    // Get the file to view
    String file = req.getPathTranslated();

    // No file, nothing to view
    if (file == null) {
      out.println("No file to view");
      return;
    }

    // Get and set the type of the file
    String contentType = getServletContext().getMimeType(file);
    res.setContentType(contentType);

    // Return the file
    try {
      ServletUtils.returnFile(file, out);
    }
    catch (FileNotFoundException e) {
      out.println("File not found");
    }
    catch (IOException e) {
      out.println("Problem sending file: " + e.getMessage());
    }
  }
}

This servlet first uses getPathTranslated( ) to get the name of the file it needs to display. Then it uses getMimeType( ) to find the content type of this file and sets the response content type to match. Last, it returns the file using the returnFile( ) method found in the com.oreilly.servlet.ServletUtils utility class:

// Send the contents of the file to the output stream
public static void returnFile(String filename, OutputStream out)
                           throws FileNotFoundException, IOException {
  // A FileInputStream is for bytes
  FileInputStream fis = null;
  try {
    fis = new FileInputStream(filename);
    byte[] buf = new byte[4 * 1024];  // 4K buffer
    int bytesRead;
    while ((bytesRead = fis.read(buf)) != -1) {
      out.write(buf, 0, bytesRead);
    }
  }
  finally {
    if (fis != null) fis.close();
  }
}

The servlet’s error handling is basic—it returns a page that describes the error. This is acceptable for our simple example (and really more than many programs seem capable of), but we’ll learn a better way using status codes in the next chapter. This servlet can be used directly with a URL like this:

http://server:port/servlet/ViewFile/index.html

Or, if you assign this servlet to handle the default URL pattern:

    <servlet>
        <servlet-name>
            vf
        </servlet-name>
        <servlet-class>
            ViewFile
        </servlet-class>
    </servlet>
    <servlet-mapping>
        <servlet-name>
            vf
        </servlet-name>
        <url-pattern>
            /
        </url-pattern>
    </servlet-mapping>

Then ViewFile is automatically invoked even for a URL like this:

http://server:port/index.html

Just beware that this servlet is a “proof-of-concept” example and does not have the full functionality of DefaultServlet.

Reading from an Abstract Resource

The getPathTranslated( ) method has some unfortunate limitations. First, it doesn’t work for content served from WAR files because there’s no direct file to access. Second, it doesn’t work in a distributed load-balanced environment where there might exist a direct file but not on the server currently executing the servlet. To get around these limitations, Servlet API 2.1 introduced a technique for resource abstraction, which allows servlets to access a resource without knowing where the resource resides. A servlet gains access to an abstract resource using getResource( ) :

public URL ServletContext.getResource(String uripath)

This method returns a URL that can be used to investigate the specified resource and read its content. How the URI path parameter maps to an actual resource (file, WAR entry, database entry, or other) is determined by the web server. The two restrictions are that the path must be absolute (beginning with a slash) and that the URI should not be an active resource such as another servlet or CGI program. The getResource( ) method is intended to support only reading of static content (no dynamic content and no writing of content).

When using the context object to request resources, it’s important to remember not to include the context path in the request. After all, the context knows its own path, and by not specifying the path in code, you ensure that the application can be moved to a different path prefix without recompiling. The following code fetches and prints the /includes/header.html file for the current context:

URL url = getServletContext().getResource("/includes/header.html");
if (url != null) {
  ServletUtils.returnURL(url, out);
}

The header.html file may exist in an archive file on a server machine other than the one hosting the servlet, but conveniently that doesn’t matter. The code uses the returnURL( ) convenience method from the com.oreilly.servlet.ServletUtils class:

// Sends URL contents to an OutputStream
public static void returnURL(URL url, OutputStream out) throws IOException {
  InputStream in = url.openStream();
  byte[] buf = new byte[4 * 1024];  // 4K buffer
  int bytesRead;
  while ((bytesRead = in.read(buf)) != -1) {
    out.write(buf, 0, bytesRead);
  }
}

// Sends URL contents to a PrintWriter
public static void returnURL(URL url, PrintWriter out) throws IOException {
  // Determine the URL's content encoding
  URLConnection con = url.openConnection();
  con.connect();
  String encoding = con.getContentEncoding();

  // Construct a Reader appropriate for that encoding
  BufferedReader in = null;
  if (encoding == null) {
    in = new BufferedReader(
         new InputStreamReader(url.openStream())); 
  }
  else {
    in = new BufferedReader(
         new InputStreamReader(url.openStream(), encoding));
  }
  char[] buf = new char[4 * 1024];  // 4Kchar buffer
  int charsRead;
  while ((charsRead = in.read(buf)) != -1) {
    out.write(buf, 0, charsRead);
  }
}

As shown in the second returnURL( ) method in the preceding code, you can use the URL object to investigate the attributes of the abstract resource. Not all servers and Java runtimes support this functionality. Here’s code that examines the home page for the current context:

URL url = getServletContext().getResource("/index.html"); // context home page
URLConnection con = url.openConnection();
con.connect();
int contentLength = con.getContentLength();     // not all support
String contentType = con.getContentType();      // not all support
long expiration = con.getExpiration();          // not all support
long lastModified = con.getLastModified();      // not all support
// etc...

Remember, the content served for the / URI path is entirely determined by the server. To access resources from another context, you can use getContext( ) :

public ServletContext ServletContext.getContext(String uripath)

This method returns a reference to the ServletContext for the given URI, subject to possible server-imposed security constraints. Here’s how to get the default context:

getServletContext().getContext("/");

And here’s how to get a reference to the web server’s home page:

getServletContext().getContext("/").getResource("/index.html");

Be aware that getResource( ) does not necessarily follow the welcome file list, so getResource"/" may not return usable content.

There’s a convenient method—getResourceAsStream( )—for reading resources as a stream:

public InputStream ServletContext.getResourceAsStream(String uripath)

It behaves essentially the same as getResource().openStream( ). For backward compatibility and to ease the transition to servlets for CGI programmers, the Servlet API will continue to include methods for file access, such as getPathTranslated( ). Just remember that anytime you access a resource using a File object you’re tying yourself to a particular machine.

Serving Resources

Using abstract resources, we can write an improved version of ViewFile that works even when the content is served from a WAR file or when the content resides on a machine different than the machine executing the servlet. The UnsafeViewResource servlet is shown in Example 4-17. It’s labeled “unsafe” because it provides no protection of resources and it blindly serves files under WEB-INF and .jsp source.

Example 4-17. Serving an Abstract Resource, Unsafely

import java.io.*;
import java.net.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

import com.oreilly.servlet.ServletUtils;

public class UnsafeViewResource extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    // Use a ServletOutputStream because we may pass binary information
    ServletOutputStream out = res.getOutputStream();
    res.setContentType("text/plain");  // sanity default

    // Get the resource to view
    String file = req.getPathInfo();
    if (file == null) {
      out.println("Extra path info was null; should be a resource to view");
      return;
    }

    // Convert the resource to a URL
    // WARNING: This allows access to files under WEB-INF and .jsp source
    URL url = getServletContext().getResource(file);
    if (url == null) {  // some servers return null if not found
      out.println("Resource " + file + " not found");
      return;
    }

    // Connect to the resource
    URLConnection con = null;
    try {
      con = url.openConnection();
      con.connect();
    }
    catch (IOException e) {
      out.println("Resource " + file + " could not be read: " + e.getMessage());
      return;
    }

    // Get and set the type of the resource
    String contentType = con.getContentType();
    res.setContentType(contentType);

    // Return the resource
    // WARNING: This returns files under WEB-INF and .jsp source files
    try {
      ServletUtils.returnURL(url, out);
    }
    catch (IOException e) {
      res.sendError(res.SC_INTERNAL_SERVER_ERROR,
              "Problem sending resource: " + e.getMessage());
    }
  }
}

This servlet views files only within its own context. Any files outside its context are inaccessible from the getServletContext().getResource( ) method. This servlet also provides no protection of resources so files under WEB-INF and .jsp source may be served directly. Example 4-18 demonstrates a safer version of the class using the ServletUtils.getResource( ) method from com.oreilly.servlet.

Example 4-18. Serving an Abstract Resource, Safely

import java.io.*;
import java.net.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

import com.oreilly.servlet.ServletUtils;

public class ViewResource extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    // Use a ServletOutputStream because we may pass binary information
    ServletOutputStream out = res.getOutputStream();
    res.setContentType("text/plain");  // sanity default

    // Get the resource to view
    URL url = null;
    try {
      url = ServletUtils.getResource(getServletContext(), req.getPathInfo());
    }
    catch (IOException e) {
      res.sendError(
        res.SC_NOT_FOUND,
        "Extra path info must point to a valid resource to view: " +
        e.getMessage());
      return;
    }

    // Connect to the resource
    URLConnection con = url.openConnection();
    con.connect();

    // Get and set the type of the resource
    String contentType = con.getContentType();
    res.setContentType(contentType);

    // Return the resource
    try {
      ServletUtils.returnURL(url, out);
    }
    catch (IOException e) {
      res.sendError(res.SC_INTERNAL_SERVER_ERROR,
              "Problem sending resource: " + e.getMessage());
    }
  }
}

The ServletUtils.getResource( ) method wraps the context.getResource( ) method and adds three convenient security checks: Resources are not served if they contain double dots, end with a slash or dot, end with .jsp, or begin with WEB-INF or META-INF. The code follows:

public static URL getResource(ServletContext context, String resource)
                                     throws IOException {
  // Short-circuit if resource is null
  if (resource == null) {
    throw new FileNotFoundException(
      "Requested resource was null (passed in null)");
  }

  if (resource.endsWith("/") ||
      resource.endsWith("\\") ||
      resource.endsWith(".")) {
    throw new MalformedURLException("Path may not end with a slash or dot");
  }

  if (resource.indexOf("..") != -1) {
    throw new MalformedURLException("Path may not contain double dots");
  }

  String upperResource = resource.toUpperCase();
  if (upperResource.startsWith("/WEB-INF") ||
      upperResource.startsWith("/META-INF")) {
    throw new MalformedURLException(
      "Path may not begin with /WEB-INF or /META-INF");
  }

  if (upperResource.endsWith(".JSP")) {
    throw new MalformedURLException(
      "Path may not end with .jsp");
  }

  // Convert the resource to a URL
  URL url = context.getResource(resource);
  if (url == null) {
    throw new FileNotFoundException(
      "Requested resource was null (" + resource + ")");
  }

  return url;
}

Serving Resources for Download

The ViewResource servlet has more than just academic use. If the ViewResource servlet hardcodes the Content-Type of the response to application/octet-stream , the servlet can act as a generic file downloader. Most browsers when they receive content of application/octet-stream offer the user a pop-up window asking where to save the content. The exception is Microsoft Internet Explorer, which, if it recognizes the content type empirically, ignores the server-assigned content type and displays the content normally. This “feature” precludes downloading GIF, JPEG, HTML, and other file types to an Internet Explorer browser.

Determining What Was Requested

A servlet can use several methods to find out exactly what file or servlet the client requested. After all, only the most conceited servlet would always assume itself to be the direct target of a request. A servlet may be nothing more than the handler for some other content.

No method directly returns the original Uniform Resource Locator (URL) used by the client to make a request. The javax.servlet.http.HttpUtils class, however, provides a getRequestURL( ) method ^[18]

that does about the same thing:

public static StringBuffer HttpUtils.getRequestURL(HttpServletRequest req)

This method reconstructs the request URL based on information available in the HttpServletRequest object. It returns a StringBuffer that includes the scheme (such as HTTP), server name, server port, and extra path information. The reconstructed URL should look almost identical to the URL used by the client. Differences between the original and reconstructed URLs should be minor (that is, a space encoded by the client as %20 might be encoded by the server as a +). Because this method returns a StringBuffer, the request URL can be modified efficiently (for example, by appending query parameters). This method is often used for creating redirect messages and reporting errors.

Most of the time, however, a servlet doesn’t really need the request URL. It just needs the request URI, which is returned by getRequestURI( ) :

public String HttpServletRequest.getRequestURI()

This method returns the Universal Resource Identifier (URI) of the request, before any URL decoding. For normal HTTP servlets, a request URI can be thought of as a URL minus the scheme, host, port, and query string, but including any extra path information. In other words, it’s the context path plus the servlet path plus the path info.^[19] Table 4-2 shows the request URIs for several request URLs.

Table 4-2. URLs and Their URIs

Request URL	Its URI Component
http://server:port/servlet/Classname	/servlet/Classname
http://server:port/servlet/registeredName	/servlet/registeredName
http://server:port/servlet/Classname?var=val	/servlet/Classname
http://server:port/servlet/Classname/pathinfo	/servlet/Classname/pathinfo
http://server:port/servlet/Classname/pathinfo?var=val	/servlet/Classname/pathinfo
http://server:port/servlet/Classname/path%20info ^[a]	/servlet/Classname/path%20info
http://server:port/alias.html (alias to a servlet)	/alias.html
http://server:port/context/path/servlet/Classname	/context/path/servlet/Classname
^[a] %20 is an encoded space

In some situations it is enough for a servlet to know the servlet name under which it was invoked. You can retrieve this information with getServletPath( ) :

public String HttpServletRequest.getServletPath()

This method returns the part of the URI that refers to the servlet being invoked (URL decoded if necessary) or null if the URI does not directly point to a servlet. The servlet path does not include extra path information or the context path. Table 4-3 shows the servlet names for several request URLs.

Table 4-3. URLs and Their Servlet Paths

Request URL	Its Servlet Path
http://server:port/servlet/Classname	/servlet/Classname
http://server:port/servlet/registeredName	/servlet/registeredName
http://server:port/servlet/Classname?var=val	/servlet/Classname
http://server:port/servlet/Classname/pathinfo	/servlet/Classname
http://server:port/servlet/Classname/pathinfo?var=val	/servlet/Classname
http://server:port/servlet/Classname/path%20info	/servlet/Classname
http://server:port/alias.html> (alias to a servlet)	/alias.html
http://server:port/context/path/servlet/Classname	/servlet/Classname

Here’s a helpful trick in remembering path information:

decoded(getRequestURI) ==
              decoded(getContextPath) + getServletPath + getPathInfo

How It Was Requested

Besides knowing what was requested, a servlet has several ways of finding out details about how it was requested. The getScheme( ) method returns the scheme used to make this request:

public String ServletRequest.getScheme()

Examples include http, https, and ftp, as well as the newer Java-specific schemes jdbc and rmi. There is no direct CGI counterpart (though some CGI implementations have a SERVER_URL variable that includes the scheme). For HTTP servlets, this method indicates whether the request was made over a secure connection using the Secure Sockets Layer (SSL), as indicated by the scheme https, or if it was an insecure request, as indicated by the scheme http .

The getProtocol( ) method returns the protocol and version number used to make the request:

public String ServletRequest.getProtocol()

The protocol and version number are separated by a slash. The method returns null if no protocol could be determined. For HTTP servlets, the protocol is usually HTTP/1.0 or HTTP/1.1. HTTP servlets can use the protocol version to determine if it’s okay with the client to use the new features in HTTP Version 1.1.

To find out what method was used for a request, a servlet uses getMethod( ) :

public String HttpServletRequest.getMethod()

This method returns the HTTP method used to make the request. Examples include GET, POST, and HEAD. The service( ) method of the HttpServlet implementation uses this method in its dispatching of requests.

Request Headers

HTTP requests and responses can have a number of associated HTTP headers. These headers provide some extra information about the request or the response. The HTTP Version 1.0 protocol defines literally dozens of possible headers; the HTTP Version 1.1 protocol includes even more. A description of all the headers extends beyond the scope of this book; we discuss only the headers most often accessed by servlets. For a full list of HTTP headers and their uses, we recommend HTTP Pocket Reference by Clinton Wong (O’Reilly) or Webmaster in a Nutshell by Stephen Spainhour and Robert Eckstein (O’Reilly).

A servlet rarely needs to read the HTTP headers accompanying a request. Many of the headers associated with a request are handled by the server itself. Take, for example, how a server restricts access to its documents. The server uses HTTP headers, and servlets need not know the details. When a server receives a request for a restricted page, it checks that the request includes an appropriate Authorization header that contains a valid username and a password. If it doesn’t, the server itself issues a response containing a WWW-Authenticate header, to tell the browser its access to a resource was denied. When the client sends a request that includes the proper Authorization header, the server grants the access and gives any servlet invoked access to the user’s name via the getRemoteUser( ) call.

Other headers are used by servlets, but indirectly. A good example is the Last-Modified and If-Last-Modified pair discussed in Chapter 3. The server itself sees the If-Last-Modified header and calls the servlet’s getLastModified( ) method to determine how to proceed.

There are a few HTTP headers that a servlet may want to read on occasion. These are listed in Table 4-4.

Table 4-4. Useful HTTP Request Headers

Header	Usage
`Accept`	Specifies the media (MIME) types the client prefers to accept, separated by commas. Some older browsers send a separate header for each media type. Each media type is divided into a type and subtype given as `type`/`subtype`. An asterisk (``) wildcard is allowed for the subtype (type* `/`) or for both the type and subtype (`/`). For example: Accept: image/gif, image/jpeg, text/, / A servlet can use this header to help determine what type of content to return. If this header is not passed as part of the request, the servlet can assume the client accepts all media types.
`Accept-Language`	Specifies the language or languages that the client prefers to receive, using the ISO-639 standard language abbreviations with an optional ISO-3166 country code. For example: Accept-Language: en, es, de, ja, zh-TW This indicates the client user reads English, Spanish, German, Japanese, and Chinese appropriate for Taiwan. By convention, languages are listed in order of preference. See Chapter 13 for more information on language negotiation.
`User-Agent`	Gives information about the client software. The format of the returned string is relatively free-form but often includes the browser name and version as well as information about the machine on which it is running. Netscape 4.7 on an SGI Indy running IRIX 6.2 reports: User-Agent: Mozilla/4.7 [en] (X11; U; IRIX 6.2 IP22) Microsoft Internet Explorer 4.0 running on a Windows 95 machine reports: User-Agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95) A servlet can use this header to keep statistics or to customize its response based on browser type.
`Referer`	Gives the URL of the document that refers to the requested URL (that is, the document that contains the link the client followed to access this document).^[a] For example: Referer: http://developer.java.sun.com/index.html A servlet can use this header to keep statistics or, if there’s some error in the request, to keep track of the documents with errors.
`Authorization`	Provides the client’s authorization to access the requested URI, including a username and password encoded in Base64. Servlets can use this for custom authorization, as discussed in Chapter 8.
^[a]A dictionary would spell the word `Referrer`. However, we have to conform to the HTTP specification spelling `Referer`.

Accessing header values

HTTP header values are accessed through the HttpServletRequest object. A header value can be retrieved as a String, a long (representing a Date), or an int, using getHeader( ) , getDateHeader( ), and getIntHeader( ), respectively:

public String HttpServletRequest.getHeader(String name)
public long HttpServletRequest.getDateHeader(String name)
public int HttpServletRequest.getIntHeader(String name)

getHeader( ) returns the value of the named header as a String or null if the header was not sent as part of the request. The name is case insensitive, as it is for all these methods. Headers of all types can be retrieved with this method.

getDateHeader( ) returns the value of the named header as a long (representing a Date) that specifies the number of milliseconds since the epoch) or -1 if the header was not sent as part of the request. This method throws an IllegalArgumentException when called on a header whose value cannot be converted to a Date. The method is useful for handling headers like Last-Modified and If-Modified-Since.

getIntHeader( ) returns the value of the named header as an int or -1 if the header was not sent as part of the request. This method throws a NumberFormatException when called on a header whose value cannot be converted to an int.

A servlet can also get the names of all the headers it can access using getHeaderNames( ) :

public Enumeration HttpServletRequest.getHeaderNames()

This method returns the names of all the headers as an Enumeration of String objects. It returns an empty Enumeration if there were no headers. The Servlet API gives servlet container implementations the right to not allow headers to be accessed in this way, in which case this method returns null.

Some headers, like Accept and Accept-Language, support multiple values. Normally these values are passed in a single header separated by spaces, but some browsers prefer to send multiple values via multiple headers:

Accept-Language: en
Accept-Language: fr
Accept-Language: ja

To read a header with multiple values, servlets can use the getHeaders( ) method:

public Enumeration HttpServletRequest.getHeaders(String name)

This method returns all the values for the given header as an Enumeration of String objects or an empty Enumeration if the header was not sent as part of the request. If the servlet container does not allow access to header information, the call returns null. There is no getDateHeaders( ) or getIntHeaders( ) method.

Example 4-19 demonstrates the use of these methods in a servlet that prints information about its HTTP request headers.

Example 4-19. Snooping Headers

import java.io.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class HeaderSnoop extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    res.setContentType("text/plain");
    PrintWriter out = res.getWriter();

    out.println("Request Headers:");
    out.println();
    Enumeration names = req.getHeaderNames();
    while (names.hasMoreElements()) {
      String name = (String) names.nextElement();
      Enumeration values = req.getHeaders(name);  // support multiple values
      if (values != null) {
        while (values.hasMoreElements()) {
          String value = (String) values.nextElement();
          out.println(name + ": " + value);
        }
      }
    }
  }
}

Some example output from this servlet might look like this:

Request Headers:

Connection: Keep-Alive
If-Modified-Since: Thursday, 17-Feb-00 23:23:58 GMT; length=297
User-Agent: Mozilla/4.7 [en] (WinNT; U)
Host: localhost:8080
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
Accept-Language: en
Accept-Language: es
Accept-Charset: iso-8859-1,*,utf-8
Cookie: JSESSIONID=q1886xlc31

Wading the Input Stream

Each request handled by a servlet has an input stream associated with it. Just as a servlet can write to a PrintWriter or OutputStream associated with its response object, it can read from a Reader or InputStream associated with its request object. The data read from the input stream can be of any content type and of any length. The input stream has two purposes:

To pass an HTTP servlet the content associated with a POST request
To pass a non-HTTP servlet the raw data sent by the client

To read character data from the input stream, you should use getReader( ) to retrieve the input stream as a BufferedReader object:

public BufferedReader ServletRequest.getReader() throws IOException

The advantage of using a BufferedReader for reading character-based data is that it should translate charsets as appropriate. This method throws an IllegalStateException if getInputStream( ) has been called before on this same request. It throws an UnsupportedEncodingException if the character encoding of the input is unsupported or unknown.

To read binary data from the input stream, use getInputStream( ) to retrieve the input stream as a ServletInputStream object:

public ServletInputStream ServletRequest.getInputStream() throws IOException

A ServletInputStream is a direct subclass of InputStream and can be treated as a normal InputStream, with the added ability to efficiently read input a line at a time into an array of bytes. The method throws an IllegalStateException if getReader( ) has been called before on this same request. Once you have the ServletInputStream, you can read a line from it using readLine( ):

public int ServletInputStream.readLine(byte b[], int off, int len)
  throws IOException

This method reads bytes from the input stream into the byte array b, starting at an offset in the array given by off. It stops reading when it encounters an \n or when it has read len number of bytes.^[20] The ending \n character is read into the buffer as well. The method returns the number of bytes read or -1 if the end of the stream is reached.

A servlet can additionally check the content type and the length of the data being sent via the input stream, using getContentType( ) and getContentLength( ), respectively:

public String ServletRequest.getContentType()
public int ServletRequest.getContentLength()

getContentType( ) returns the media type of the content being sent via the input stream or null if the type is not known (such as when there is no data). getContentLength( ) returns the length, in bytes, of the content being sent via the input stream or -1 if this is not known.

Handling POST requests using the input stream

It is a rare occurrence when a servlet handling a POST request is forced to use its input stream to access the POST data. Typically, the POST data is nothing more than encoded parameter information, which a servlet can conveniently retrieve with its getParameter( ) method.

A servlet can identify this type of POST request by checking the content type of the input stream. If it is of type application/x-www-form-urlencoded, the data can be retrieved with getParameter( ) and similar methods.

A servlet may wish to call the getContentLength( ) method before calling getParameter( ) to prevent denial-of-service attacks. A rogue client may send an absurdly large amount of data as part of a POST request, hoping to slow the server to a crawl as the servlet’s getParameter( ) method churns over the data. A servlet can use getContentLength( ) to verify that the length is reasonable, perhaps less than 4K, as a preventive measure.

Receiving files using the input stream

A servlet can also receive a file upload using its input stream. Before we see how, it’s important to note that file uploading is experimental and not supported in all browsers. Netscape first supported file uploads with Netscape Navigator 3; Microsoft first supported it with Internet Explorer 4.

The full file upload specification is contained in experimental RFC 1867, available at http://www.ietf.org/rfc/rfc1867.txt, with additions in RFC 2388 at http://www.ietf.org/rfc/rfc2388.txt. The short summary is that any number of files and parameters can be sent as form data in a single POST request. The POST request is formatted differently than standard application/x-www-form-urlencoded form data and indicates this fact by setting its content type to multipart/form-data .

It’s fairly simple to write the client half of a file upload. The HTML in Example 4-20 generates a form that asks for a user’s name and a file to upload. Note the addition of the ENCTYPE attribute and the use of a FILE input type.

Example 4-20. A Form for Choosing a File to Upload

<FORM ACTION="/servlet/UploadTest" ENCTYPE="multipart/form-data" METHOD=POST>
What is your name? <INPUT TYPE=TEXT NAME=submitter> <BR>
Which file do you want to upload? <INPUT TYPE=FILE NAME=file> <BR>
<INPUT TYPE=SUBMIT>
</FORM>

A user receiving this form sees a page that looks something like Figure 4-3. A filename can be entered in the text area, or it can be selected by browsing. Multiple <INPUT TYPE=FILE> fields can be used, but current browsers support uploading only one file per field. After selection, the user submits the form as usual.

Figure 4-3. Choosing a file to upload

The server’s responsibilities during a file upload are slightly more complicated. From the receiving servlet’s perspective, the submission is nothing more than a raw data stream in its input stream—a data stream formatted according to the multipart/form-data content type given in RFC 1867. The Servlet API provides no methods to aid in the parsing of the data. To simplify your life (and ours since we don’t want to explain RFC 1867), Jason has written a utility class that does the work for you. It’s named MultipartRequest and is shown in Example 4-22 later in this section.

MultipartRequest wraps around a ServletRequest and presents a simple API to the servlet programmer. The class has two constructors:

public MultipartRequest(HttpServletRequest request, String saveDirectory,
                        int maxPostSize) throws IOException
public MultipartRequest(HttpServletRequest request,
                        String saveDirectory) throws IOException

Each of these methods creates a new MultipartRequest object to handle the specified request, saving any uploaded files to saveDirectory. Both constructors actually parse the multipart/form-data content and throw an IOException if there’s any problem (so servlets using this class must refrain from reading the input stream themselves). The constructor that takes a maxPostSize parameter also throws an IOException if the uploaded content is larger than maxPostSize. The second constructor assumes a default maxPostSize of 1 MB.

Note that a server has two choices when receiving an upload whose content length is too large: first, try to send an error page, wait for the client to disconnect, and while waiting silently consume all uploaded content. This follows the HTTP/1.1 specification in RFC 2616, Section 8.2.2, which dictates a client should listen for an “error status” during its upload (see http://www.ietf.org/rfc/rfc2616.txt) and halt the upload should it receive an error. This ensures every client sees a proper error message, but for the many browsers which don’t listen for an error status, this approach wastes server bandwidth as the client continues the full upload. For this reason many servers implement a second option: try to send an error page, and forcefully disconnect if necessary. This leaves some clients without a polite error message, but it assuredly stops the upload.

The MultipartRequest class has seven public methods that let you get at information about the request. You’ll notice that many of these methods are modeled after ServletRequest methods. Use getParameterNames( ) to retrieve the names of all the request parameters:

public Enumeration MultipartRequest.getParameterNames()

This method returns the names of all the parameters as an Enumeration of String objects or an empty Enumeration if there are no parameters.

To get the value of a named parameter, use getParameter( ) or getParameterValues( ):

public String MultipartRequest.getParameter(String name)

This method returns the value of the named parameter as a String or null if the parameter was not given. The value is guaranteed to be in its normal, decoded form. If the parameter has multiple values, only the last one is returned.

public String[] MultipartRequest.getParameterValues(String name)

This method returns all the values of the named parameter as an array of String objects or null if the parameter was not given. A single value is returned in an array of length 1.

Use getFileNames( ) to get a list of all the uploaded files:

public Enumeration MultipartRequest.getFileNames()

This method returns the names of all the uploaded files as an Enumeration of String objects, or an empty Enumeration if there are no uploaded files. Note that each filename is the name specified by the HTML form’s name attribute, not by the user. Once you have the name of a file, you can get its filesystem name using getFilesystemName( ) :

public String MultipartRequest.getFilesystemName(String name)

This method returns the filesystem name of the specified file or null if the file was not included in the upload. A filesystem name is the name specified by the user. It is also the name under which the file is actually saved. You can get the content type of the file with getContentType( ) :

public String MultipartRequest.getContentType(String name)

This method returns the content type of the specified file (as supplied by the client browser) or null if the file was not included in the upload. Finally, you can get a java.io.File object for the file with getFile( ) :

public File MultipartRequest.getFile(String name)

This method returns a File object for the specified file saved on the server’s filesystem or null if the file was not included in the upload.

Example 4-21 shows how a servlet uses MultipartRequest. The servlet does nothing but display the statistics for what was uploaded. Notice that it does not delete the files it saves.

Example 4-21. Handling a File Upload

import java.io.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

import com.oreilly.servlet.MultipartRequest;

public class UploadTest extends HttpServlet {

  public void doPost(HttpServletRequest req, HttpServletResponse res)
                                throws ServletException, IOException {
    res.setContentType("text/html");
    PrintWriter out = res.getWriter();

    try {
      // Blindly take it on faith this is a multipart/form-data request

      // Construct a MultipartRequest to help read the information.
      // Pass in the request, a directory to save files to, and the
      // maximum POST size we should attempt to handle.
      // Here we (rudely) write to the current dir and impose 5 Meg limit.
      MultipartRequest multi =
        new MultipartRequest(req, ".", 5 * 1024 * 1024);

      out.println("<HTML>");
      out.println("<HEAD><TITLE>UploadTest</TITLE></HEAD>");
      out.println("<BODY>");
      out.println("<H1>UploadTest</H1>");

      // Print the parameters we received
      out.println("<H3>Params:</H3>");
      out.println("<PRE>");
      Enumeration params = multi.getParameterNames();
      while (params.hasMoreElements()) {
        String name = (String)params.nextElement();
        String value = multi.getParameter(name);
        out.println(name + " = " + value);
      }
      out.println("</PRE>");

      // Show which files we received
      out.println("<H3>Files:</H3>");
      out.println("<PRE>");
      Enumeration files = multi.getFileNames();
      while (files.hasMoreElements()) {
        String name = (String)files.nextElement();
        String filename = multi.getFilesystemName(name);
        String type = multi.getContentType(name);
        File f = multi.getFile(name);
        out.println("name: " + name);
        out.println("filename: " + filename);
        out.println("type: " + type);
        if (f != null) {
          out.println("length: " + f.length());
        }
        out.println();
      }
      out.println("</PRE>");
    }
    catch (Exception e) {
      out.println("<PRE>");
      e.printStackTrace(out);
      out.println("</PRE>");
    }
    out.println("</BODY></HTML>");
  }
}

The servlet passes its request object to the MultipartRequest constructor, along with a directory relative to the server root where the uploaded files are to be saved (because large files may not fit in memory) and a maximum POST size of 5 MB. The servlet then uses MultipartRequest to iterate over the parameters that were sent. Notice that the MultipartRequest API for handling parameters matches that of ServletRequest. Finally, the servlet uses its MultipartRequest to iterate over the files that were sent. For each file, it gets the file’s name (as specified on the form), filesystem name (as specified by the user), and content type. It also gets a File reference and uses it to display the length of the saved file. If there are any problems, the servlet reports the exception to the user.

Example 4-22 shows the code for MultipartRequest . You’ll notice the MultipartRequest class actually uses com.oreilly.servlet.multipart.MultipartParser behind the scenes to handle the task of parsing the request. The MultipartParser class provides low-level access to the upload by walking the request piece by piece. This allows, for example, direct uploading of files into a database or checking that a file passes certain criteria before saving. The code and documentation for MultipartParser is available at http://www.servlets.com. (My thanks to Geoff Soutter for the factoring-out work necessary to create the parser.)

Be aware that many server vendors don’t adequately test their server for file upload use, and it’s not uncommon for this class to uncover server bugs. If you have problems using the class try another server (such as Tomcat) to identify if it’s a server-specific issue and if so contact your server vendor.

Example 4-22. The MultipartRequest Class

package com.oreilly.servlet;

import java.io.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

import com.oreilly.servlet.multipart.MultipartParser;
import com.oreilly.servlet.multipart.Part;
import com.oreilly.servlet.multipart.FilePart;
import com.oreilly.servlet.multipart.ParamPart;

// A utility class to handle <code>multipart/form-data</code> requests.
public class MultipartRequest {

  private static final int DEFAULT_MAX_POST_SIZE = 1024 * 1024;  // 1 Meg

  private Hashtable parameters = new Hashtable();  // name - Vector of values
  private Hashtable files = new Hashtable();       // name - UploadedFile

  public MultipartRequest(HttpServletRequest request,
                          String saveDirectory) throws IOException {
    this(request, saveDirectory, DEFAULT_MAX_POST_SIZE);
  }

  public MultipartRequest(HttpServletRequest request,
                          String saveDirectory,
                          int maxPostSize) throws IOException {
    // Sanity check values
    if (request == null)
      throw new IllegalArgumentException("request cannot be null");
    if (saveDirectory == null)
      throw new IllegalArgumentException("saveDirectory cannot be null");
    if (maxPostSize <= 0) {
      throw new IllegalArgumentException("maxPostSize must be positive");
    }

    // Save the dir
    File dir = new File(saveDirectory);

    // Check saveDirectory is truly a directory
    if (!dir.isDirectory())
      throw new IllegalArgumentException("Not a directory: " + saveDirectory);

    // Check saveDirectory is writable
    if (!dir.canWrite())
      throw new IllegalArgumentException("Not writable: " + saveDirectory);

    // Parse the incoming multipart, storing files in the dir provided, 
    // and populate the meta objects which describe what we found
    MultipartParser parser = new MultipartParser(request, maxPostSize);

    Part part;
    while ((part = parser.readNextPart()) != null) {
      String name = part.getName();
      if (part.isParam()) {
        // It's a parameter part, add it to the vector of values
        ParamPart paramPart = (ParamPart) part;
        String value = paramPart.getStringValue();
        Vector existingValues = (Vector)parameters.get(name);
        if (existingValues == null) {
          existingValues = new Vector();
          parameters.put(name, existingValues);
        }
        existingValues.addElement(value);
      }
      else if (part.isFile()) {
        // It's a file part
        FilePart filePart = (FilePart) part;
        String fileName = filePart.getFileName();
        if (fileName != null) {
          // The part actually contained a file
          filePart.writeTo(dir);
          files.put(name, new UploadedFile(
                      dir.toString(), fileName, filePart.getContentType()));
        }
        else { 
          // The field did not contain a file
          files.put(name, new UploadedFile(null, null, null));
        }
      }
    }
  }

  // Constructor with an old signature, kept for backward compatibility.
  public MultipartRequest(ServletRequest request,
                          String saveDirectory) throws IOException {
    this((HttpServletRequest)request, saveDirectory);
  }

  // Constructor with an old signature, kept for backward compatibility.
  public MultipartRequest(ServletRequest request,
                          String saveDirectory,
                          int maxPostSize) throws IOException {
    this((HttpServletRequest)request, saveDirectory, maxPostSize);
  }

  public Enumeration getParameterNames() {
    return parameters.keys();
  }

  public Enumeration getFileNames() {
    return files.keys();
  }

  public String getParameter(String name) {
    try {
      Vector values = (Vector)parameters.get(name);
      if (values == null || values.size() == 0) {
        return null;
      }
      String value = (String)values.elementAt(values.size() - 1);
      return value;
    }
    catch (Exception e) {
      return null;
    }
  }

  public String[] getParameterValues(String name) {
    try {
      Vector values = (Vector)parameters.get(name);
      if (values == null || values.size() == 0) {
        return null;
      }
      String[] valuesArray = new String[values.size()];
      values.copyInto(valuesArray);
      return valuesArray;
    }
    catch (Exception e) {
      return null;
    }
  }

  public String getFilesystemName(String name) {
    try {
      UploadedFile file = (UploadedFile)files.get(name);
      return file.getFilesystemName();  // may be null
    }
    catch (Exception e) {
      return null;
    }
  }

  public String getContentType(String name) {
    try {
      UploadedFile file = (UploadedFile)files.get(name);
      return file.getContentType();  // may be null
    }
    catch (Exception e) {
      return null;
    }
  }

  public File getFile(String name) {
    try {
      UploadedFile file = (UploadedFile)files.get(name);
      return file.getFile();  // may be null
    }
    catch (Exception e) {
      return null;
    }
  }
}


// A class to hold information about an uploaded file.
class UploadedFile {

  private String dir;
  private String filename;
  private String type;

  UploadedFile(String dir, String filename, String type) {
    this.dir = dir;
    this.filename = filename;
    this.type = type;
  }

  public String getContentType() {
    return type;
  }

  public String getFilesystemName() {
    return filename;
  }

  public File getFile() {
    if (dir == null || filename == null) {
      return null;
    }
    else {
      return new File(dir + File.separator + filename);
    }
  }
}

MultipartRequest is production quality and, unlike many other file upload libraries, supports arbitrarily large uploads. Can you figure out why the class doesn’t implement the HttpServletRequest interface? It’s because to do so would limit its forward compatibility. If the class implemented HttpServletRequest and Servlet API 2.3 were to add a method to the interface, this class would no longer fully implement the interface, causing compiles of servlets using the class to fail.

Extra Attributes

Sometimes a servlet needs to know something about a request that’s not available via any of the previously mentioned methods. In these cases, there is one last alternative, the getAttribute( ) method. Remember how ServletContext has a getAttribute( ) method that returns server-specific attributes about the server itself? ServletRequest also has a getAttribute( ) method:

public Object ServletRequest.getAttribute(String name)

This method returns the value of a server-specific attribute for the request or null if the server does not support the named request attribute. This method allows a server to provide a servlet with custom information about a request. Servers are free to provide whatever attributes they choose, or even no attributes at all. The only rules are that attribute names should follow the same convention as package names, with the package names java.* and javax.* reserved for use by the Java Software division of Sun Microsystems and com.sun.* reserved for use by Sun Microsystems. You should see your server’s documentation for a description of its attributes, and remember that using any server-specific attributes restricts your application’s portability.

Servlets can also add their own attributes to the request using the setAttribute( ) method as discussed in Chapter 11. A listing of all current attributes, hard-coded by the server or placed there by servlets, can be obtained with getAttributeNames( ) :

public Enumeration ServletRequest.getAttributeNames()

The following code displays all current attributes:

Enumeration enum = req.getAttributeNames();
while (enum.hasMoreElements()) {
  String name = (String) enum.nextElement();
  out.println("  req.getAttribute(\"" + name + "\"): " +
                 req.getAttribute(name));
}

Several standard attributes exist involving included requests (see Chapter 11) and client-side digital certificates (see Chapter 8).

^[16]Want to know how to say “access denied” for the 11th access attempt? It’s in the next chapter.

^[17]The getParameter( ) method was deprecated in the Java Web Server 1.1 in favor of getParameterValues( ). However, after quite a lot of public protest, Sun took getParameter( ) off the deprecation list in the final release of Servlet API 2.0. It was the first Java method to be undeprecated!

^[18]Why isn’t there a method that directly returns the original URL shown in the browser? Because the browser never sends the full URL. The port number, for example, is used by the client to make its HTTP connection, but it isn’t included in the request made to the web server answering on that port.

^[19]Technically, what is referred to here as a request URI could more formally be called a request URL path. This is because a URI is, in the most precise sense, a general-purpose identifier for a resource. A URL is one type of URI; a URN (Uniform Resource Name) is another. For more information on URIs, URLs, and URNs, see RFC 1630 at http://www.ietf.org/rfc/rfc1630.txt.

^[20]Servlet API 2.0 implementations of readLine( ) suffered from a bug where the passed-in len parameter was ignored, causing problems including an ArrayIndexOutOfBoundsException if the line length exceeded the buffer size. This bug is fixed in Servlet API 2.1 and later.

Get Java Servlet Programming, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Java Servlet Programming, 2nd Edition by Jason Hunter, William Crawford