A URL is represented by an instance of
the java.net.URL
class. A URL
object manages all the component information within a URL string and
provides methods for retrieving the object it identifies. We can
construct a URL
object from a URL specification
string or from its component parts:
try { URL aDoc = new URL( "http://foo.bar.com/documents/homepage.html" ); URL sameDoc = new URL("http","foo.bar.com","documents/homepage.html"); } catch ( MalformedURLException e ) { }
These two URL
objects point to the same network
resource, the homepage.html document on the
server foo.bar.com. Whether the resource
actually exists and is available isn’t known until we try to
access it. At this point, the URL
object just
contains data about the object’s location and how to access it.
No connection to the server has been made. We can examine the
URL
’s components with
the
getProtocol( )
, getHost( )
, and
getFile( )
methods. We can also compare it to
another URL
with the sameFile( )
method (which has an unfortunate name for
something which may not point to a file). sameFile( )
determines whether two URLs point to the same resource.
It can be fooled, but sameFile( )
does more than
compare the URLs for equality; it takes into account the possibility
that one server may have several names, and other factors.
When
a URL
is created, its specification is parsed to
identify just the protocol component. If the protocol doesn’t
make sense, or if Java can’t find a protocol handler for it,
the URL constructor throws a
MalformedURLException
. A protocol
handler is a Java class that implements the communications
protocol for accessing the URL resource. For example, given an
http
URL, Java prepares to use the HTTP protocol
handler to retrieve documents from the specified server.
The lowest level way to get data back from
URL
is to ask for an InputStream
from
the URL
by calling openStream( )
. Currently, if you’re writing an
applet or working in an otherwise untrusted environment this is about
your only choice. Getting the data as a stream may be useful if you
want to receive continuous updates from a dynamic information source.
The drawback is that you have to parse the contents of the object
yourself. Not all types of URLs support the openStream( )
method because not all types of URLs refer to concrete
data; you’ll get an
UnknownServiceException
if the URL doesn’t.
The following code prints the contents of an HTML file:
try { URL url = new URL("http://server/index.html"); BufferedReader bin = new BufferedReader ( new InputStreamReader( url.openStream( ) )); String line; while ( (line = bin.readLine( )) != null ) System.out.println( line ); } catch (Exception e) { }
We ask for an InputStream
with
openStream( )
and wrap it in a
BufferedReader
to read the lines of text. Because
we specify the http
protocol in the URL, we
require the services of an HTTP protocol handler. As we’ll
discuss later, that raises some questions about what kinds of
handlers we have available. This example partially works around those
issues because no content handler is involved; we read the data and
interpret the content ourselves (by simply printing it).
Applets have
additional restrictions. To be sure that you can access the specified
URL and the correct protocol handler, construct
URL
s relative to the base URL that identifies the
applet’s codebase—the location of
the applet code. For example:
new URL( getCodeBase( ), "foo/bar.gif" );
This should guarantee that the needed protocol is available and
accessible to the applet. However if you are just trying to get data
files or media associated with an applet, there is a more general
way; see the discussion of getResource( )
in Chapter 10.
openStream( )
operates at a lower level than the more general
content-handling mechanism implemented by the URL
class. We showed it first because, until some things are settled,
you’ll be limited as to when you can use URLs in their more
powerful role. When a proper content handler is installed, you can
retrieve the item the URL
addresses as a Java
object, by calling the
URL
’s
getContent( )
method. Currently, this only works if you supply one with your
application or install one in the local classpath. (The HotJava web
browser also provides a mechanism for adding new handlers.) In this
mode of operation getContent( )
initiates a
connection to the host, fetches the data for you, determines the
MIME (Multipurpose Internet Mail Extensions)
type of the contents, and invokes a content handler to turn the bytes
into a Java object. MIME is a standard that was developed to
facilitate multimedia email, but it has become widely used as a
general way to specify how to treat data; Java uses MIME to help it
pick the right content handler.
For example, given the URL http://foo.bar.com/index.html, a call to
getContent( )
would use the HTTP protocol handler
to retrieve data and an HTML content handler to turn the data into an
appropriate document object. A URL that points to a plain-text file
might use a text-content handler that returns a
String
object. Similarly, a GIF file might be
turned into an ImageProducer
object using a GIF
content handler. If we accessed the GIF file using an
“ftp” URL, Java would use the same content handler but
would use the FTP protocol handler to receive the data.
getContent( )
returns the output of the content
handler, but leaves us wondering what kind of object we got. Since
the content handler has to be able to return anything, the return
type of getContent( )
is
Object
. In a moment, we’ll describe how we
could ask the protocol handler about the object’s MIME type,
which it discovered. Based on this, and whatever other knowledge we
have about the kind of object we are expecting, we can cast the
Object
to its appropriate, more specific type. For
example, if we expect a String
, we’ll cast
the result of getContent( )
to a
String
:
try { String content = (String)myURL.getContent( ); } catch ( ClassCastException e ) { ... }
If
we’re wrong about the type, we’ll get a
ClassCastException
. As an alternative, we could
check the type of the returned object using the
instanceof
operator, like this:
if ( content instanceof String ) { String s = (String)content; ...
Various kinds of
errors can occur when trying to retrieve the data. For example,
getContent( )
can throw an
IOException
if there is a communications error.
Other kinds of errors can occur at the application level: some
knowledge of how the application-specific content and protocol
handlers deal with errors is necessary.
One problem that could arise is that a
content handler for the data’s MIME type wouldn’t be
available. In this case, getContent( )
just
invokes an “unknown type” handler and returns the data as
a raw InputStream
. A sophisticated application
might specialize this behavior to try to decide what to do with the
data on its own.
In some
situations, we may also need knowledge of the protocol handler. For
example, consider a URL
that refers to a
nonexistent file on an HTTP server. When requested, the server
probably returns a valid HTML document that contains the familiar
“404 Not Found” message. In a naive implementation, an
HTML content handler might be invoked to interpret this message and
return it as it would any other HTML document. To check the validity
of protocol-specific operations like this, we may need to talk to the
protocol handler.
The openStream( )
and getContent( )
methods both implicitly create the connection to the
remote URL
object.
When the connection is set up, the protocol handler is consulted to
create a URLConnection
object. The
URLConnection
manages the protocol-specific
communications. We can get a URLConnection
for our
URL with
the
openConnection( )
method. One of the things we can
do with the URLConnection
is ask for the
object’s content type. For example:
URLConnection connection = myURL.openConnection( ); String mimeType = connection.getContentType( ); ... Object contents = myURL.getContents( );
We can also get protocol-specific information. Different protocols
provide different types of URLConnection
objects.
The HttpURLConnection
object, for instance, can
interpret the “404 Not Found” message and tell us
about the
problem. (We’ll examine
URLConnection
s in
detail in Appendix A.)
Get Learning Java now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.