The URL Class

A URL is represented by an instance of the java.net.URL class. A URL object manages all the component information within a URL string and provides methods for retrieving the object it identifies. We can construct a URL object from a URL specification string or from its component parts:

try {  
    URL aDoc =
      new URL( "http://foo.bar.com/documents/homepage.html" );
    URL sameDoc =
      new URL("http","foo.bar.com","documents/homepage.html");
}   
catch ( MalformedURLException e ) { }

These two URL objects point to the same network resource, the homepage.html document on the server foo.bar.com. Whether the resource actually exists and is available isn’t known until we try to access it. At this point, the URL object just contains data about the object’s location and how to access it. No connection to the server has been made. We can examine the URL’s components with the getProtocol( ), getHost( ) , and getFile( ) methods. We can also compare it to another URL with the sameFile( ) method (which has an unfortunate name for something which may not point to a file). sameFile( ) determines whether two URLs point to the same resource. It can be fooled, but sameFile( ) does more than compare the URLs for equality; it takes into account the possibility that one server may have several names, and other factors.

When a URL is created, its specification is parsed to identify just the protocol component. If the protocol doesn’t make sense, or if Java can’t find a protocol handler for it, the URL constructor throws a MalformedURLException. A protocol handler is a Java class that implements the communications protocol for accessing the URL resource. For example, given an http URL, Java prepares to use the HTTP protocol handler to retrieve documents from the specified server.

Stream Data

The lowest level way to get data back from URL is to ask for an InputStream from the URL by calling openStream( ) . Currently, if you’re writing an applet or working in an otherwise untrusted environment this is about your only choice. Getting the data as a stream may be useful if you want to receive continuous updates from a dynamic information source. The drawback is that you have to parse the contents of the object yourself. Not all types of URLs support the openStream( ) method because not all types of URLs refer to concrete data; you’ll get an UnknownServiceException if the URL doesn’t.

The following code prints the contents of an HTML file:

try {  
    URL url = new URL("http://server/index.html");  
 
    BufferedReader bin = new BufferedReader ( 
        new InputStreamReader( url.openStream( ) )); 
 
    String line; 
    while ( (line = bin.readLine( )) != null )   
        System.out.println( line ); 
} catch (Exception e) { }

We ask for an InputStream with openStream( ) and wrap it in a BufferedReader to read the lines of text. Because we specify the http protocol in the URL, we require the services of an HTTP protocol handler. As we’ll discuss later, that raises some questions about what kinds of handlers we have available. This example partially works around those issues because no content handler is involved; we read the data and interpret the content ourselves (by simply printing it).

Applets have additional restrictions. To be sure that you can access the specified URL and the correct protocol handler, construct URLs relative to the base URL that identifies the applet’s codebase—the location of the applet code. For example:

new URL( getCodeBase( ), "foo/bar.gif" );

This should guarantee that the needed protocol is available and accessible to the applet. However if you are just trying to get data files or media associated with an applet, there is a more general way; see the discussion of getResource( ) in Chapter 10.

Getting the Content as an Object

openStream( ) operates at a lower level than the more general content-handling mechanism implemented by the URL class. We showed it first because, until some things are settled, you’ll be limited as to when you can use URLs in their more powerful role. When a proper content handler is installed, you can retrieve the item the URL addresses as a Java object, by calling the URL’s getContent( ) method. Currently, this only works if you supply one with your application or install one in the local classpath. (The HotJava web browser also provides a mechanism for adding new handlers.) In this mode of operation getContent( ) initiates a connection to the host, fetches the data for you, determines the MIME (Multipurpose Internet Mail Extensions) type of the contents, and invokes a content handler to turn the bytes into a Java object. MIME is a standard that was developed to facilitate multimedia email, but it has become widely used as a general way to specify how to treat data; Java uses MIME to help it pick the right content handler.

For example, given the URL http://foo.bar.com/index.html, a call to getContent( ) would use the HTTP protocol handler to retrieve data and an HTML content handler to turn the data into an appropriate document object. A URL that points to a plain-text file might use a text-content handler that returns a String object. Similarly, a GIF file might be turned into an ImageProducer object using a GIF content handler. If we accessed the GIF file using an “ftp” URL, Java would use the same content handler but would use the FTP protocol handler to receive the data.

getContent( ) returns the output of the content handler, but leaves us wondering what kind of object we got. Since the content handler has to be able to return anything, the return type of getContent( ) is Object. In a moment, we’ll describe how we could ask the protocol handler about the object’s MIME type, which it discovered. Based on this, and whatever other knowledge we have about the kind of object we are expecting, we can cast the Object to its appropriate, more specific type. For example, if we expect a String, we’ll cast the result of getContent( ) to a String:

try  { 
    String content = (String)myURL.getContent( );  
} catch ( ClassCastException e ) { ... }

If we’re wrong about the type, we’ll get a ClassCastException. As an alternative, we could check the type of the returned object using the instanceof operator, like this:

if ( content instanceof String ) {  
    String s = (String)content;  
    ...

Various kinds of errors can occur when trying to retrieve the data. For example, getContent( ) can throw an IOException if there is a communications error. Other kinds of errors can occur at the application level: some knowledge of how the application-specific content and protocol handlers deal with errors is necessary.

One problem that could arise is that a content handler for the data’s MIME type wouldn’t be available. In this case, getContent( ) just invokes an “unknown type” handler and returns the data as a raw InputStream. A sophisticated application might specialize this behavior to try to decide what to do with the data on its own.

In some situations, we may also need knowledge of the protocol handler. For example, consider a URL that refers to a nonexistent file on an HTTP server. When requested, the server probably returns a valid HTML document that contains the familiar “404 Not Found” message. In a naive implementation, an HTML content handler might be invoked to interpret this message and return it as it would any other HTML document. To check the validity of protocol-specific operations like this, we may need to talk to the protocol handler.

The openStream( ) and getContent( ) methods both implicitly create the connection to the remote URL object. When the connection is set up, the protocol handler is consulted to create a URLConnection object. The URLConnection manages the protocol-specific communications. We can get a URLConnection for our URL with the openConnection( ) method. One of the things we can do with the URLConnection is ask for the object’s content type. For example:

URLConnection connection = myURL.openConnection( ); 
String mimeType = connection.getContentType( ); 
... 
Object contents = myURL.getContents( );

We can also get protocol-specific information. Different protocols provide different types of URLConnection objects. The HttpURLConnection object, for instance, can interpret the “404 Not Found” message and tell us about the problem. (We’ll examine URLConnections in detail in Appendix A.)

Get Learning Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.