Content and protocol handlers represent one of the most interesting ideas from the original Java vision. Unfortunately, as far as we can tell, no one has taken up the challenge of using this intriguing facility. We considered dropping them from the book entirely, but that decision just felt bad. Instead, we banished the discussion of how to write content and protocol handlers to an appendix. If you let us know that this material is important to you, we’ll keep it in the next edition. If you feel “yes, this is interesting, but why do I care?” we’ll drop them from the book. (You can send us comments through the book’s web page at http://www.oreilly.com/catalog/learnjava.)
This appendix picks up where we left our discussion of content and protocol handlers in Chapter 12. We’ll show you how to write your own handlers, which can be used in any Java application, including the HotJava web browser. In this section, we’ll write a content handler that reads Unix tar files and a protocol handler that implements a pluggable encryption scheme. You should be able to drop both into your class path and start using them in the HotJava web browser right away.
The URL
class’s getContent( )
method invokes a content handler
whenever it’s called to retrieve an object at some URL. The
content handler must read the flat stream of data produced by the
URL
’s protocol handler (the data read from
the remote source), and construct a well-defined Java object from it.
By “flat,” we mean that the data stream the content
handler receives has no artifacts left over from retrieving the data
and processing the protocol. It’s the protocol handler’s
job to fetch and decode the data before passing it along. The
protocol handler’s output is your data, pure and simple.
The roles of content and protocol handlers do not overlap. The content handler doesn’t care how the data arrives, or what form it takes. It’s concerned only with what kind of object it’s supposed to create. For example, if a particular protocol involves sending an object over the network in a compressed format, the protocol handler should do whatever is necessary to unpack it before passing the data on to the content handler. The same content handler can then be used again with a completely different protocol handler to construct the same type of object received via a different transport mechanism.
Let’s look at an example. The following lines construct a URL that points to a GIF file on an FTP archive and attempt to retrieve its contents:
try { URL url = new URL ("ftp://ftp.wustl.edu/graphics/gif/a/apple.gif"); ImageProducer imgsrc = (ImageProducer)url.getContent( ); ...
When we construct the URL
object, Java looks at
the first part of the URL string (everything prior to the colon) to
determine the protocol and locate a protocol handler. In this case,
it locates the FTP protocol handler, which is used to open a
connection to the host and transfer data for the specified file.
After making the connection, the
URL
object asks the protocol handler to identify the
resource’s MIME type. The handler can try to resolve the MIME
type through a variety of means, but in this case, it might just look
at the filename extension (.gif ) and determine
that the MIME type of the data is image/gif
. Here,
image/gif
is a string that denotes that the
content falls into the category of images and is, more specifically,
a GIF image. The protocol handler then looks for the content handler
responsible for the image/gif
type and uses it to
construct the right kind of object from the data. The content handler
returns an ImageProducer
object, which
getContent( )
returns to us as an
Object
. As we’ve seen before, we cast this
Object
back to its real type so we can work with
it.
In an upcoming section, we’ll build a simple content handler.
To keep things as simple as possible, our example will produce text
as output; the URL
’s get-Content( )
method will return this as a String
object.
When Java searches for a
class, it translates package names into filesystem pathnames. (The
classes may also be in a JAR file in the class path, but we’ll
refer to them as files and directories anyway.) This applies to
locating content-handler classes as well as other kinds of classes.
For example, a class in a package named
foo.bar.handlers
would live in a directory with
foo/bar/handlers/ as part of its pathname. To
allow Java to find handler classes for arbitrary new MIME types,
content handlers are organized into packages corresponding to the
basic MIME type categories. The handler classes themselves are then
named after the specific MIME type. This allows Java to map
MIME types directly to class names. The
only remaining piece of information Java needs is a list of packages
in which the handlers might reside. To supply this information, use
the system properties java.content.handler.pkgs
and java.protocol.handler.pkgs
. In these
properties, you can use a vertical bar (|) to separate different
packages in a list.
We’ll put our content handlers in the
learningjava.contenthandlers
package. According to
the scheme for naming content handlers, a handler for the
image/gif
MIME type is called
gif
and placed in a package that is called
learningjava.contenthandlers.image
. The fully
qualified name of the class would then be
learningjava.contenthandlers.image.gif
, and it
would be located in the file
learningjava/contenthandlers/image/gif.class,
somewhere in the local class path, or, perhaps someday, on a server.
Likewise, a content handler for the video/mpeg
MIME type would be called mpeg
, and an
mpeg.class file would be located in a
learningjava/contenthandlers/video/ directory
somewhere in the class path.
Many MIME type names include a
dash (-
), which is
illegal in a class name. You should convert dashes and other illegal
characters into underscores (_) when building Java class and package
names. Also note that there are no capital letters in the class
names. This violates the coding convention used in most Java source
files, in which class names start with capital letters. However,
capitalization is not significant in MIME type names, so it is
simpler to name the handler classes accordingly.
In this section, we’ll build a simple content handler that
reads and interprets
tar (tape archive) files. tar is an
archival format widely used in the Unix
-world to
hold collections of files, along with their basic type and attribute
information.[57] A tar
file is similar to a JAR file, except
that it’s not compressed. Files in the archive are stored
sequentially, in flat text or binary with no special encoding. In
practice, tar files are usually compressed for storage using an
application like Unix compress or GNU gzip and then named with a
filename extension like .tar.gz or .tgz.
Most web browsers, upon retrieving a tar file, prompt the user with a File Save dialog. The assumption is that if you are retrieving an archive, you probably want to save it for later unpacking and use. We would like to implement a tar content handler that allows an application to read the contents of the archive and give us a listing of the files that it contains. In itself, this would not be the most useful thing in the world, because we would be left with the dilemma of how to get at the archive’s contents. However, a more complete implementation of our content handler, used in conjunction with an application like a web browser, could generate HTML output or pop up a dialog that lets us select and save individual files within the archive.
Some code that fetches a tar file and lists its contents might look like this:
try { URL listing = new URL("http://somewhere.an.edu/lynx/lynx2html.tar"); String s = (String)listing.getContents( ); System.out.println( s ); ...
Our handler will produce a listing similar to the Unix tar application’s output:
Tape Archive Listing: 0 Tue Sep 28 18:12:47 CDT 1993 lynx2html/ 14773 Tue Sep 28 18:01:55 CDT 1993 lynx2html/lynx2html.c 470 Tue Sep 28 18:13:24 CDT 1993 lynx2html/Makefile 172 Thu Apr 01 15:05:43 CST 1993 lynx2html/lynxgate 3656 Wed Mar 03 15:40:20 CST 1993 lynx2html/install.csh 490 Thu Apr 01 14:55:04 CST 1993 lynx2html/new_globals.c ...
Our handler will dissect the file to read the contents and generate
the listing. The URL
’s getContent( )
method will return that information to an application as
a String
object.
First we must decide what to call our content handler and where to
put it. The MIME-type hierarchy classifies the tar format as an
application type extension. Its proper
MIME
type is then
application/x-tar
. Therefore, our handler belongs
in the learningjava.contenthandlers.application
package and goes into the class file
learningjava/contenthandlers/application/x_tar.class.
Note that the name of our class is x_tar
, rather
than x-tar
; you’ll remember the dash is
illegal in a class name so, by convention, we convert it to an
underscore.
Here’s the code for the content handler; compile it and put it in learningjava/contenthandlers/application/, somewhere in your class path:
//file: x_tar.java package learningjava.contenthandlers.application; import java.net.*; import java.io.*; import java.util.Date; public class x_tar extends ContentHandler { static int RECORDLEN = 512, NAMEOFF = 0, NAMELEN = 100, SIZEOFF = 124, SIZELEN = 12, MTIMEOFF = 136, MTIMELEN = 12; public Object getContent(URLConnection uc) throws IOException { InputStream is = uc.getInputStream( ); StringBuffer output = new StringBuffer( "Tape Archive Listing:\n\n" ); byte [] header = new byte[RECORDLEN]; int count = 0; while ( (is.read(header) == RECORDLEN) && (header[NAMEOFF] != 0) ) { String name = new String(header, NAMEOFF, NAMELEN, "8859_1"). trim( ); String s = new String(header, SIZEOFF, SIZELEN, "8859_1").trim( ); int size = Integer.parseInt(s, 8); s = new String(header, MTIMEOFF, MTIMELEN, "8859_1").trim( ); long l = Integer.parseInt(s, 8); Date mtime = new Date( l*1000 ); output.append( size + " " + mtime + " " + name + "\n" ); count += is.skip( size ) + RECORDLEN; if ( count % RECORDLEN != 0 ) count += is.skip ( RECORDLEN - count % RECORDLEN); } if ( count == 0 ) output.append("Not a valid TAR file\n"); return( output.toString( ) ); } }
Our x_tar
handler is
a subclass of the abstract class
java.net.ContentHandler
. Its job is to implement
one method: getContent( )
, which takes as an
argument a special “protocol connection” object and
returns a constructed Java Object
. The
getContent( )
method of the URL
class ultimately uses this getContent( )
method
when we ask for the contents of the URL.
The code looks formidable, but most of it’s involved with processing the details of the tar format. If we remove these details, there isn’t much left:
public class x_tar extends ContentHandler { public Object getContent( URLConnection uc ) throws IOException { // get input stream InputStream is = uc.getInputStream( ); // read stream and construct object // ... // return the constructed object return( output.toString( ) ); } }
That’s really all there is to a content handler; it’s relatively simple.
The
java.net.URLConnection
object that
getContent( )
receives represents the protocol
handler’s connection to the remote resource. It provides a
number of methods for examining information about the
URL
resource, such as header and type fields, and
for determining the kinds of operations the protocol supports.
However, its most important method is getInputStream( )
, which returns an
InputStream
from the protocol handler. Reading
this InputStream
gives you the raw data for the
object the URL
addresses. In our case, reading the
InputStream
feeds x_tar
the
bytes of the tar file it’s to process.
The majority of our getContent( )
method is
devoted to interpreting the stream of bytes of the tar file and
building our output object: the String
that lists
the contents of the tar file. Again, this means that this example
involves the particulars of reading tar files, so you shouldn’t
fret too much about the details.
After requesting an InputStream
from the
URLConnection
, x_tar
loops,
gathering information about each file. Each archived item is preceded
by a header that contains attribute and length fields.
x_tar
interprets each header and then skips over
the remaining portion of the item. To parse the header, we use the
String
constructor to read a fixed number of
characters from the byte array header[]
. To
convert these bytes into a Java String
properly,
we specify the character encoding used by web servers: 8859_1, which
(for the most part) is equivalent to ASCII. Once we have a
file’s name, size, and time stamp, we accumulate the results
(the file listings) in a StringBuffer
—one
line per file. When the listing is complete, getContent( )
returns the StringBuffer
as a
String
object.
The main while
loop continues as long as
it’s able to read another header record, and as long as the
record’s “name” field isn’t full of ASCII
null values. (The tar file format calls for the end of the archive to
be padded with an empty header record, although most tar
implementations don’t seem to do this.) The
while
loop retrieves the name, size, and
modification times as character strings from fields in the header.
The most common tar format stores its numeric values in octal, as
fixed-length ASCII
strings.
We extract the strings and use Integer.parseInt( )
to parse them.
After reading and parsing the header,
x_tar
skips over the data portion of the file and
updates the variable count
, which keeps track of
the offset into the archive. The two lines following the initial skip
account for tar
’s “blocking” of
the data records. In other words, if the data portion of a file
doesn’t fit precisely into an integral number of blocks of
RECORDLEN
bytes, tar adds padding to make it fit.
As we said, the details of parsing tar files are not really our main
concern here. But x_tar
does
illustrate a few tricks of data manipulation in Java.
It may surprise you that we didn’t have to provide a
constructor; our content handler relies on its default constructor.
We don’t need to provide a constructor because there
isn’t anything for it to do. Java doesn’t pass the class
any argument information when it creates an instance of it. You might
suspect that the URLConnection
object would be a
natural thing to provide at that point. However, when you are calling
the constructor of a class that is loaded at runtime, you can’t
easily pass it any arguments.
When we began this discussion of content handlers, we showed a brief
example of how our x_tar
content handler would
work for us. You can try that code snippet now with your favorite tar
file by setting the java.content.handler.pkgs
system property to learningjava.contenthandlers
and making sure that package is in your class path.
To make things more exciting, try setting the property in your HotJava properties file. (The HotJava properties file usually resides in a .hotjava directory in your home directory or in the HotJava installation directory on a Windows machine.) Make sure that the class path is set before you start HotJava. Once HotJava is running, go to the Preferences menu, and select Viewer Applications. Find the type TAR archive, and set its Action to View in HotJava. This tells HotJava to try to use a content handler to display the data in the browser. Now, drive HotJava to a URL that contains a tar file. The result should look something like that shown in Figure 1.1.
We’ve just extended our copy of HotJava to understand tar
files! In the next section, we’ll turn the tables and look at
protocol handlers. There we’ll be building
URLConnection
objects; someone
else will have
the pleasure of reconstituting the data.
[57] There are several slightly different versions of the tar format. This content handler understands the most widely used variant.
Get Learning Java now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.