Search the Catalog
Java Examples in a Nutshell, 2nd Edition

Java Examples in a Nutshell, 2nd Edition

By David Flanagan
2nd Edition October 2000
0-596-00039-1, Order Number: 0391
584 pages, $29.95

Chapter 19
XML

Contents:
Parsing with JAXP and SAX 1
Parsing with SAX 2
Parsing and Manipulating with JAXP and DOM
Traversing a DOM Tree
Traversing a Document with DOM Level 2
The JDOM API
Exercises

XML, or Extensible Markup Language, is a meta-language for marking up text documents with structural tags, similar to those found in HTML and SGML documents. XML has become popular because its structural markup allows documents to describe their own format and contents. XML enables "portable data," and it can be quite powerful when combined with the "portable code" enabled by Java.

Because of the popularity of XML, there are a number of tools for parsing and manipulating XML documents. And because XML documents are becoming more and more common, it is worth your time to learn how to use some of those tools to work with XML. The examples in this chapter introduce you to simple XML parsing and manipulation. If you are familiar with the basic structure of an XML file, you should have no problem understanding them. Note that there are many subtleties to working with XML; this chapter doesn't attempt to explain them all. To learn more about XML, try Java and XML, by Brett McLaughlin, or XML Pocket Reference, by Robert Eckstein, both from O'Reilly & Associates.

The world of XML and its affiliated technologies is moving so fast that it can be hard just keeping up with the acronyms, standards, APIs, and version numbers. I'll try to provide an overview of the state of various technologies in this chapter, but be warned that things may have changed, sometimes radically, by the time you read this material.

Parsing with JAXP and SAX 1

The first thing you want to do with an XML document is parse it. There are two commonly used approaches to XML parsing: they go by the acronyms SAX and DOM. We'll begin with SAX parsing; DOM parsing is covered later in the chapter. At the very end of the chapter, we'll also see a new, but very promising, Java-centric XML API known as JDOM.

SAX is the Simple API for XML. SAX is not a parser, but rather a Java API that describes how a parser operates. When parsing an XML document using the SAX API, you define a class that implements various "event" handling methods. As the parser encounters the various element types of the XML document, it invokes the corresponding event handler methods you've defined. Your methods take whatever actions are required to accomplish the desired task. In the SAX model, the parser converts an XML document into a sequence of Java method calls. The parser doesn't build a parse tree of any kind (although your methods can do this, if you want). SAX parsing is typically quite efficient and is therefore your best choice for most simple XML processing tasks.

The SAX API was created by David Megginson (http://www.megginson.com/SAX/). The Java implementation of the API is in the package org.xml.sax and its subpackages. SAX is a defacto standard but has not been standardized by any official body. SAX Version 1 has been in use for some time; SAX 2 was finalized in May 2000. There are numerous changes between the SAX 1 and SAX 2 APIs. Many Java-based XML parsers exist that conform to the SAX 1 or SAX 2 APIs.

With the SAX API, you can't completely abstract away the details of the XML parser implementation you are using: at a minimum, your code must supply the classname of the parser to be used. This is where JAXP comes in. JAXP is the Java API for XML Parsing. It is an "optional package" defined by Sun that consists of the javax.xml.parsers package. JAXP provides a thin layer on top of SAX (and on top of DOM, as we'll see) and standardizes an API for obtaining and using SAX (and DOM) parser objects. The JAXP package ships with default parser implementations but allows other parsers to be easily plugged in and configured using system properties. At this writing, the current version of JAXP is 1.0.1; it supports SAX 1, but not SAX 2. By the time you read this, however, JAXP 1.1, which will include support for SAX 2, may have become available.

Example 19.1 is a listing of ListServlets1.java, a program that uses JAXP and SAX to parse a web application deployment descriptor and list the names of the servlets configured by that file. If you haven't yet read Chapter 18, Servlets and JSP, you should know that servlet-based web applications are configured using an XML file named web.xml. This file contains <servlet> tags that define mappings between servlet names and the Java classes that implement them. To help you understand the task to be solved by the ListServlets1.java program, here is an excerpt from the web.xml file developed in Chapter 18:

  <servlet>
    <servlet-name>hello</servlet-name>
    <servlet-class>com.davidflanagan.examples.servlet.Hello</servlet-class>
  </servlet>
  
  <servlet>
    <servlet-name>counter</servlet-name>  
    <servlet-class>com.davidflanagan.examples.servlet.Counter</servlet-class>
    <init-param>
      <param-name>countfile</param-name>         <!-- where to save state -->
      <param-value>/tmp/counts.ser</param-value> <!-- adjust for your system-->
    </init-param>
    <init-param>
      <param-name>saveInterval</param-name>      <!-- how often to save -->
      <param-value>30000</param-value>           <!-- every 30 seconds -->
    </init-param>
  </servlet>
  
  <servlet>
    <servlet-name>logout</servlet-name>
    <servlet-class>com.davidflanagan.examples.servlet.Logout</servlet-class>
  </servlet>

ListServlets1.java includes a main() method that uses the JAXP API to obtain a SAX parser instance. It then tells the parser what to parse and starts the parser running. The remaining methods of the class are invoked by the parser. Note that ListServlets1 extends the SAX HandlerBase class. This superclass provides dummy implementations of all the SAX event handler methods. The example simply overrides the handlers of interest. The parser calls the startElement() method when it reads an XML tag; it calls endElement() when it finds a closing tag. characters() is invoked when the parser reads a string of plain text with no markup. Finally, the parser calls warning(), error(), or fatalError() when something goes wrong in the parsing process. The implementations of these methods are written specifically to extract the desired information from a web.xml file and are based on a knowledge of the structure of this type of file.

Note that web.xml files are somewhat unusual in that they don't rely on attributes for any of the XML tags. That is, servlet names are defined by a <servlet-name> tag nested within a <servlet> tag, instead of simply using a name attribute of the <servlet> tag itself. This fact makes the example program slightly more complex than it would otherwise be. The web.xml file does allow id attributes for all its tags. Although servlet engines are not expected to use these attributes, they may be useful to a configuration tool that parses and automatically generates web.xml files. For completeness, the startElement() method in Example 19.1 looks for an id attribute of the <servlet> tag. The value of that attribute, if it exists, is reported in the program's output.

Example 19.1: ListServlets1.java

package com.davidflanagan.examples.xml;
import javax.xml.parsers.*;                   // The JAXP package
import org.xml.sax.*;                         // The main SAX package
import java.io.*;

/**
 * Parse a web.xml file using JAXP and SAX1.  Print out the names
 * and class names of all servlets listed in the file.
 * 
 * This class implements the HandlerBase helper class, which means
 * that it defines all the "callback" methods that the SAX parser will
 * invoke to notify the application.  In this example we override the 
 * methods that we require.
 *
 * This example uses full package names in places to help keep the JAXP
 * and SAX APIs distinct.
 **/
public class ListServlets1 extends org.xml.sax.HandlerBase {
    /** The main method sets things up for parsing */
    public static void main(String[] args)
        throws IOException, SAXException, ParserConfigurationException
    {
        // Create a JAXP "parser factory" for creating SAX parsers
        javax.xml.parsers.SAXParserFactory spf=SAXParserFactory.newInstance();

        // Configure the parser factory for the type of parsers we require
        spf.setValidating(false);  // No validation required

        // Now use the parser factory to create a SAXParser object
        // Note that SAXParser is a JAXP class, not a SAX class
        javax.xml.parsers.SAXParser sp = spf.newSAXParser();
        
        // Create a SAX input source for the file argument
        org.xml.sax.InputSource input=new InputSource(new FileReader(args[0]));

        // Give the InputSource an absolute URL for the file, so that
        // it can resolve relative URLs in a <!DOCTYPE> declaration, e.g.
        input.setSystemId("file://" + new File(args[0]).getAbsolutePath());

        // Create an instance of this class; it defines all the handler methods
        ListServlets1 handler = new ListServlets1();

        // Finally, tell the parser to parse the input and notify the handler
        sp.parse(input, handler);
        
        // Instead of using the SAXParser.parse() method, which is part of the
        // JAXP API, we could also use the SAX1 API directly.  Note the
        // difference between the JAXP class javax.xml.parsers.SAXParser and
        // the SAX1 class org.xml.sax.Parser
        //
        // org.xml.sax.Parser parser = sp.getParser();  // Get the SAX parser
        // parser.setDocumentHandler(handler);          // Set main handler
        // parser.setErrorHandler(handler);             // Set error handler
        // parser.parse(input);                         // Parse!
    }

    StringBuffer accumulator = new StringBuffer();  // Accumulate parsed text
    String servletName;      // The name of the servlet
    String servletClass;     // The class name of the servlet
    String servletId;        // Value of id attribute of <servlet> tag

    // When the parser encounters plain text (not XML elements), it calls
    // this method, which accumulates them in a string buffer
    public void characters(char[] buffer, int start, int length) {
        accumulator.append(buffer, start, length);
    }

    // Every time the parser encounters the beginning of a new element, it
    // calls this method, which resets the string buffer
    public void startElement(String name, AttributeList attributes) {
        accumulator.setLength(0);  // Ready to accumulate new text
        // If its a servlet tag, look for id attribute
        if (name.equals("servlet"))
            servletId = attributes.getValue("id");
    }

    // When the parser encounters the end of an element, it calls this method
    public void endElement(String name) {
        if (name.equals("servlet-name")) {
            // After </servlet-name>, we know the servlet name saved up
            servletName = accumulator.toString().trim();
        }
        else if (name.equals("servlet-class")) {
            // After </servlet-class>, we've got the class name accumulated
            servletClass = accumulator.toString().trim();
        }
        else if (name.equals("servlet")) {
            // Assuming the document is valid, then when we parse </servlet>,
            // we know we've got a servlet name and class name to print out
            System.out.println("Servlet " + servletName +
                               ((servletId != null)?" (id="+servletId+")":"") +
                               ": " + servletClass);
        }
    }

    /** This method is called when warnings occur */
    public void warning(SAXParseException exception) {
        System.err.println("WARNING: line " + exception.getLineNumber() + ": "+
                           exception.getMessage());
    }

    /** This method is called when errors occur */
    public void error(SAXParseException exception) {
        System.err.println("ERROR: line " + exception.getLineNumber() + ": " +
                           exception.getMessage());
    }

    /** This method is called when non-recoverable errors occur. */
    public void fatalError(SAXParseException exception) throws SAXException {
        System.err.println("FATAL: line " + exception.getLineNumber() + ": " +
                           exception.getMessage());
        throw(exception);
    }
}

Compiling and Running the Example

To run the previous example, you need the JAXP package from Sun. You can download it by following the download links from http://java.sun.com/xml/. Once you've downloaded the package, uncompress the archive it is packaged in and install it somewhere convenient on your system. In Version 1.0.1 of JAXP, the download bundle contains two JAR files: jaxp.jar, the JAXP API classes, and parser.jar, the SAX and DOM APIs and default parser implementations. To compile and run this example, you need both JAR files in your classpath. If you have any other XML parsers, such as the Xerces parser, in your classpath, remove them or make sure that the JAXP files are listed first; otherwise you may run into version-skew problems between the different parsers. Note that you probably don't want to permanently alter your classpath, since you'll have to change it again for the next example. One simple solution with Java 1.2 and later is to temporarily drop copies of the JAXP JAR files into the jre/lib/ext/ directory of your Java installation.

With the two JAXP JAR files temporarily in your classpath, you can compile and run ListServlets1.java as usual. When you run it, specify the name of a web.xml file on the command line. You can use the sample file included with the downloadable examples for this book or specify one from your own servlet engine.

There is one complication to this example. Most web.xml files contain a <!DOCTYPE> tag that specifies the document type (or DTD). Despite the fact that Example 19.1 specifies that the parser should not validate the document, a conforming XML parser must still read the DTD for any document that has a <!DOCTYPE> declaration. Most web.xml have a declaration like this:

<!DOCTYPE web-app
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
    "http://java.sun.com/j2ee/dtds/web-app_2.2.dtd">
In order to read the DTD, the parser must be able to read the specified URL. If your system is not connected to the Internet when you run the example, it will hang. One workaround is to replace the DTD URL with the name of a local copy of the DTD, which is what has been done in the sample web.xml file bundled with the downloadable examples. Another workaround to this DTD problem is to simply remove (or comment out) the <!DOCTYPE> declaration from the web.xml file you process with ListServlets1.

Parsing with SAX 2

Example 19.1 showed how you can parse an XML document using the SAX 1 API, which is what is supported by the current version of JAXP (at this writing). The SAX 1 API is out of date, however. So Example 19.2 shows how you can accomplish a similar parsing task using the SAX 2 API and the open-source Xerces parser available from the Apache Software Foundation.

Example 19.2 is a listing of the program ListServlets2.java. Like the ListServlets1.java example, this program reads a specified web.xml file and looks for <servlet> tags, so it can print out the servlet name-to-servlet class mappings. This example goes a little further than the last, however, and also looks for <servlet-mapping> tags, so it can also output the URL patterns that are mapped to named servlets. The example uses two hashtables to store the information as it accumulates it, then prints out all the information when parsing is complete.

The SAX 2 API is functionally similar to the SAX 1 API, but a number of classes and interfaces have new names and some methods have new signatures. Many of the changes were required for the addition of XML namespace support in SAX 2. As you read through Example 19.2, pay attention to the API differences from Example 19.1.

Example 19.2: ListServlets2.java

package com.davidflanagan.examples.xml;
import org.xml.sax.*;             // The main SAX package
import org.xml.sax.helpers.*;     // SAX helper classes
import java.io.*;                 // For reading the input file
import java.util.*;               // Hashtable, lists, and so on

/**
 * Parse a web.xml file using the SAX2 API and the Xerces parser from the
 * Apache project.
 * 
 * This class extends DefaultHandler so that instances can serve as SAX2
 * event handlers, and can be notified by the parser of parsing events.
 * We simply override the methods that receive events we're interested in
 **/
public class ListServlets2 extends org.xml.sax.helpers.DefaultHandler {
    /** The main method sets things up for parsing */
    public static void main(String[] args) throws IOException, SAXException {
        // Create the parser we'll use.  The parser implementation is a 
        // Xerces class, but we use it only through the SAX XMLReader API
        org.xml.sax.XMLReader parser=new org.apache.xerces.parsers.SAXParser();

        // Specify that we don't want validation.  This is the SAX2
        // API for requesting parser features.  Note the use of a
        // globally unique URL as the feature name.  Non-validation is
        // actually the default, so this line isn't really necessary.
        parser.setFeature("http://xml.org/sax/features/validation", false);

        // Instantiate this class to provide handlers for the parser and 
        // tell the parser about the handlers
        ListServlets2 handler = new ListServlets2();
        parser.setContentHandler(handler);
        parser.setErrorHandler(handler);

        // Create an input source that describes the file to parse.
        // Then tell the parser to parse input from that source
        org.xml.sax.InputSource input=new InputSource(new FileReader(args[0]));
        parser.parse(input);
    }

    HashMap nameToClass;     // Map from servlet name to servlet class name
    HashMap nameToPatterns;  // Map from servlet name to url patterns

    StringBuffer accumulator;                         // Accumulate text
    String servletName, servletClass, servletPattern; // Remember text

    // Called at the beginning of parsing.  We use it as an init() method
    public void startDocument() {
        accumulator = new StringBuffer();
        nameToClass = new HashMap();
        nameToPatterns = new HashMap();
    }

    // When the parser encounters plain text (not XML elements), it calls
    // this method, which accumulates them in a string buffer.
    // Note that this method may be called multiple times, even with no
    // intervening elements.
    public void characters(char[] buffer, int start, int length) {
        accumulator.append(buffer, start, length);
    }

    // At the beginning of each new element, erase any accumulated text.
    public void startElement(String namespaceURL, String localName,
                             String qname, Attributes attributes) {
        accumulator.setLength(0);
    }

    // Take special action when we reach the end of selected elements.
    // Although we don't use a validating parser, this method does assume
    // that the web.xml file we're parsing is valid.
    public void endElement(String namespaceURL, String localName, String qname)
    {
        if (localName.equals("servlet-name")) {        // Store servlet name
            servletName = accumulator.toString().trim();
        }
        else if (localName.equals("servlet-class")) {  // Store servlet class
            servletClass = accumulator.toString().trim();
        }
        else if (localName.equals("url-pattern")) {    // Store servlet pattern
            servletPattern = accumulator.toString().trim();
        }
        else if (localName.equals("servlet")) {        // Map name to class
            nameToClass.put(servletName, servletClass);
        }
        else if (localName.equals("servlet-mapping")) {// Map name to pattern
            List patterns = (List)nameToPatterns.get(servletName);
            if (patterns == null) {
                patterns = new ArrayList();
                nameToPatterns.put(servletName, patterns);
            }
            patterns.add(servletPattern);
        }
    }

    // Called at the end of parsing.  Used here to print our results.
    public void endDocument() {
        List servletNames = new ArrayList(nameToClass.keySet());
        Collections.sort(servletNames);
        for(Iterator iterator = servletNames.iterator(); iterator.hasNext();) {
            String name = (String)iterator.next();
            String classname = (String)nameToClass.get(name);
            List patterns = (List)nameToPatterns.get(name);
            System.out.println("Servlet: " + name);
            System.out.println("Class: " + classname);
            if (patterns != null) {
                System.out.println("Patterns:");
                for(Iterator i = patterns.iterator(); i.hasNext(); ) {
                    System.out.println("\t" + i.next());
                }
            }
            System.out.println();
        }
    }

    // Issue a warning
    public void warning(SAXParseException exception) {
        System.err.println("WARNING: line " + exception.getLineNumber() + ": "+
                           exception.getMessage());
    }

    // Report a parsing error
    public void error(SAXParseException exception) {
        System.err.println("ERROR: line " + exception.getLineNumber() + ": " +
                           exception.getMessage());
    }

    // Report a non-recoverable error and exit
    public void fatalError(SAXParseException exception) throws SAXException {
        System.err.println("FATAL: line " + exception.getLineNumber() + ": " +
                           exception.getMessage());
        throw(exception);
    }
}

Compiling and Running the Example

The ListServlets2 example uses the Xerces-J parser from the Apache XML Project. You can download this open-source parser by following the download links from http://xml.apache.org/. Once you have downloaded Xerces-J, unpack the distribution in a convenient location on your system. In that distribution, you should find a xerces.jar file. This file must be in your classpath to compile and run the ListServlets2.java example. Note that the xerces.jar file and the parsers.jar file from the JAXP distribution both contain versions of the SAX and DOM classes; you should avoid having both files in your classpath at the same time.

Parsing and Manipulating with JAXP and DOM

The first two examples in this chapter used the SAX API for parsing XML documents. We now turn to another commonly used parsing API, the DOM, or Document Object Model. The DOM API is a standard defined by the World Wide Web Consortium (W3C); its Java implementation consists of the org.w3c.dom package and its subpackages. The current version of the DOM standard is Level 1. As of this writing, the DOM Level 2 API is making its way through the standardization process at the W3C.

The Document Object Model defines the API of a parse tree for XML documents. The org.xml.dom.Node interface specifies the basic features of a node in this parse tree. Subinterfaces, such as Document, Element, Entity, and Comment, define the features of specific types of nodes. A program that uses the DOM parsing model is quite different from one that uses SAX. With the DOM, you have the parser read your XML document and transform it into a tree of Node objects. Once parsing is complete, you can traverse the tree to find the information you need. The DOM parsing model is useful if you need to make multiple passes through the tree, if you want to modify the structure of the tree, or if you need random access to an XML document, instead of the sequential access provided by the SAX model.

Example 19.3 is a listing of the program WebAppConfig.java. Like the first two examples in this chapter, WebAppConfig reads a web.xml web application deployment descriptor. This example uses a DOM parser to build a parse tree, then performs some operations on the tree to demonstrate how you can work with a tree of DOM nodes.

The WebAppConfig() constructor uses the JAXP API to obtain a DOM parser and then uses that parser to build a parse tree that represents the XML file. The root node of this tree is of type Document. This Document object is stored in an instance field of the WebAppConfig object, so it is available for traversal and modification by the other methods of the class. The class also includes a main() method that invokes these other methods.

The getServletClass() method looks for <servlet-name> tags and returns the text of the associated <servlet-class> tags. (These tags always come in pairs in a web.xml file.) This method demonstrates a number of features of the DOM parse tree, notably the getElementsByTagName() method. The addServlet() method inserts a new <servlet> tag into the parse tree. It demonstrates how to construct new DOM nodes and add them to an existing parse tree. Finally, the output() method uses an XMLDocumentWriter to traverse all the nodes of the parse tree and convert them back into XML format. The XMLDocumentWriter class is covered in the next section and listed in Example 19.4.

Example 19.3: WebAppConfig.java

package com.davidflanagan.examples.xml;
import javax.xml.parsers.*;   // JAXP classes for parsing
import org.w3c.dom.*;         // W3C DOM classes for traversing the document
import org.xml.sax.*;         // SAX classes used for error handling by JAXP
import java.io.*;             // For reading the input file

/**
 * A WebAppConfig object is a wrapper around a DOM tree for a web.xml
 * file.  The methods of the class use the DOM API to work with the
 * tree in various ways.
 **/
public class WebAppConfig {
    /** The main method creates and demonstrates a WebAppConfig object */
    public static void main(String[] args)
        throws IOException, SAXException, ParserConfigurationException
    {
        // Create a new WebAppConfig object that represents the web.xml
        // file specified by the first command-line argument
        WebAppConfig config = new WebAppConfig(new File(args[0]));
        // Query the tree for the class name associated with the specified
        // servlet name
        System.out.println("Class for servlet " + args[1] + " is " +
                           config.getServletClass(args[1]));
        // Add a new servlet name-to-class mapping to the DOM tree
        config.addServlet("foo", "bar");
        // And write out an XML version of the DOM tree to standard out
        config.output(new PrintWriter(new OutputStreamWriter(System.out)));
    }

    org.w3c.dom.Document document;  // This field holds the parsed DOM tree

    /**
     * This constructor method is passed an XML file.  It uses the JAXP API to
     * obtain a DOM parser, and to parse the file into a DOM Document object,
     * which is used by the remaining methods of the class.
     **/
    public WebAppConfig(File configfile)
        throws IOException, SAXException, ParserConfigurationException
    {
        // Get a JAXP parser factory object
        javax.xml.parsers.DocumentBuilderFactory dbf =
            DocumentBuilderFactory.newInstance();
        // Tell the factory what kind of parser we want 
        dbf.setValidating(false);
        // Use the factory to get a JAXP parser object
        javax.xml.parsers.DocumentBuilder parser = dbf.newDocumentBuilder();

        // Tell the parser how to handle errors.  Note that in the JAXP API,
        // DOM parsers rely on the SAX API for error handling
        parser.setErrorHandler(new org.xml.sax.ErrorHandler() {
                public void warning(SAXParseException e) {
                    System.err.println("WARNING: " + e.getMessage());
                }
                public void error(SAXParseException e) {
                    System.err.println("ERROR: " + e.getMessage());
                }
                public void fatalError(SAXParseException e)
                    throws SAXException {
                    System.err.println("FATAL: " + e.getMessage());
                    throw e;   // re-throw the error
                }
            });

        // Finally, use the JAXP parser to parse the file.  This call returns
        // A Document object.  Now that we have this object, the rest of this
        // class uses the DOM API to work with it; JAXP is no longer required.
        document = parser.parse(configfile);
    }

    /**
     * This method looks for specific Element nodes in the DOM tree in order
     * to figure out the classname associated with the specified servlet name
     **/
    public String getServletClass(String servletName) {
        // Find all <servlet> elements and loop through them.
        NodeList servletnodes = document.getElementsByTagName("servlet");
        int numservlets = servletnodes.getLength();
        for(int i = 0; i < numservlets; i++) {
            Element servletTag = (Element)servletnodes.item(i);
            // Get the first <servlet-name> tag within the <servlet> tag
            Element nameTag = (Element)
                servletTag.getElementsByTagName("servlet-name").item(0);
            if (nameTag == null) continue;

            // The <servlet-name> tag should have a single child of type
            // Text.  Get that child, and extract its text.  Use trim()
            // to strip whitespace from the beginning and end of it.
            String name =((Text)nameTag.getFirstChild()).getData().trim();
           
            // If this <servlet-name> tag has the right name
            if (servletName.equals(name)) {
                // Get the matching <servlet-class> tag
                Element classTag = (Element)
                    servletTag.getElementsByTagName("servlet-class").item(0);
                if (classTag != null) {
                    // Extract the tag's text as above, and return it
                    Text classTagContent = (Text)classTag.getFirstChild();
                    return classTagContent.getNodeValue().trim();
                }
            }
        }

        // If we get here, no matching servlet name was found
        return null;
    }

    /**
     * This method adds a new name-to-class mapping in in the form of
     * a <servlet> sub-tree to the document.
     **/
    public void addServlet(String servletName, String className) {
        // Create the <servlet> tag
        Element newNode = document.createElement("servlet");
        // Create the <servlet-name> and <servlet-class> tags
        Element nameNode = document.createElement("servlet-name");
        Element classNode = document.createElement("servlet-class");
        // Add the name and classname text to those tags
        nameNode.appendChild(document.createTextNode(servletName));
        classNode.appendChild(document.createTextNode(className));
        // And add those tags to the servlet tag
        newNode.appendChild(nameNode);
        newNode.appendChild(classNode);
        
        // Now that we've created the new sub-tree, figure out where to put
        // it.  This code looks for another servlet tag and inserts the new
        // one right before it. Note that this code will fail if the document
        // does not already contain at least one <servlet> tag.
        NodeList servletnodes = document.getElementsByTagName("servlet");
        Element firstServlet = (Element)servletnodes.item(0);

        // Insert the new node before the first servlet node
        firstServlet.getParentNode().insertBefore(newNode, firstServlet);
    }

    /**
     * Output the DOM tree to the specified stream as an XML document.
     * See the XMLDocumentWriter example for the details.
     **/
    public void output(PrintWriter out) {
        XMLDocumentWriter docwriter = new XMLDocumentWriter(out);
        docwriter.write(document);
        docwriter.close();
    }
}

Compiling and Running the Example

The WebAppConfig class uses the JAXP and DOM APIs, so you must have the jaxp.jar and parser.jar files from the JAXP distribution in your classpath. You should avoid having the Xerces JAR file in your classpath at the same time, or you may run into version mismatch problems between the DOM Level 1 parser of JAXP 1.0 and the DOM Level 2 parser of Xerces. Compile WebAppConfig.java in the normal way. To run the program, specify the name of a web.xml file to parse as the first command-line argument and provide a servlet name as the second argument. When you run the program, it prints the class name (if any) that is mapped to the specified servlet name. Then it inserts a dummy <servlet> tag into the parse tree and prints out the modified parse tree in XML format to standard output. You'll probably want to pipe the output of the program to a paging program such as more.

Traversing a DOM Tree

The WebAppConfig class of Example 19.3 parses an XML file to a DOM tree, modifies the tree, then converts the tree back into an XML file. It does this using the class XMLDocumentWriter, which is listed in Example 19.4. The write() method of this class recursively traverses a DOM tree node by node and outputs the equivalent XML text to the specified PrintWriter stream. The code is relatively straightforward and helps illustrate the structure of a DOM tree. Note that XMLDocumentWriter is just an example. Among its shortcomings: it doesn't handle every possible type of DOM node, and it doesn't output a full <!DOCTYPE> declaration.

Example 19.4: XMLDocumentWriter.java

package com.davidflanagan.examples.xml;
import org.w3c.dom.*;         // W3C DOM classes for traversing the document
import java.io.*;

/**
 * Output a DOM Level 1 Document object to a java.io.PrintWriter as a simple
 * XML document.  This class does not handle every type of DOM node, and it
 * doesn't deal with all the details of XML like DTDs, character encodings and
 * preserved and ignored whitespace.  However, it does output basic
 * well-formed XML that can be parsed by a non-validating parser.
 **/
public class XMLDocumentWriter {
    PrintWriter out;  // the stream to send output to

    /** Initialize the output stream */
    public XMLDocumentWriter(PrintWriter out) { this.out = out; }

    /** Close the output stream. */
    public void close() { out.close(); }

    /** Output a DOM Node (such as a Document) to the output stream */
    public void write(Node node) { write(node, ""); }

    /**
     * Output the specified DOM Node object, printing it using the specified
     * indentation string
     **/
    public void write(Node node, String indent) {
        // The output depends on the type of the node
        switch(node.getNodeType()) {
        case Node.DOCUMENT_NODE: {       // If its a Document node
            Document doc = (Document)node;
            out.println(indent + "<?xml version='1.0'?>");  // Output header
            Node child = doc.getFirstChild();   // Get the first node
            while(child != null) {              // Loop 'till no more nodes
                write(child, indent);           // Output node
                child = child.getNextSibling(); // Get next node
            }
            break;
        } 
        case Node.DOCUMENT_TYPE_NODE: {  // It is a <!DOCTYPE> tag
            DocumentType doctype = (DocumentType) node;
            // Note that the DOM Level 1 does not give us information about
            // the the public or system ids of the doctype, so we can't output
            // a complete <!DOCTYPE> tag here.  We can do better with Level 2.
            out.println("<!DOCTYPE " + doctype.getName() + ">");
            break;
        }
        case Node.ELEMENT_NODE: {        // Most nodes are Elements
            Element elt = (Element) node;
            out.print(indent + "<" + elt.getTagName());   // Begin start tag
            NamedNodeMap attrs = elt.getAttributes();     // Get attributes
            for(int i = 0; i < attrs.getLength(); i++) {  // Loop through them
                Node a = attrs.item(i);
                out.print(" " + a.getNodeName() + "='" +  // Print attr. name
                          fixup(a.getNodeValue()) + "'"); // Print attr. value
            }
            out.println(">");                             // Finish start tag

            String newindent = indent + "    ";           // Increase indent
            Node child = elt.getFirstChild();             // Get child
            while(child != null) {                        // Loop 
                write(child, newindent);                  // Output child
                child = child.getNextSibling();           // Get next child
            }

            out.println(indent + "</" +                   // Output end tag
                        elt.getTagName() + ">");
            break;
        }
        case Node.TEXT_NODE: {                   // Plain text node
            Text textNode = (Text)node;
            String text = textNode.getData().trim();   // Strip off space
            if ((text != null) && text.length() > 0)   // If non-empty
                out.println(indent + fixup(text));     // print text
            break;
        }
        case Node.PROCESSING_INSTRUCTION_NODE: {  // Handle PI nodes
            ProcessingInstruction pi = (ProcessingInstruction)node;
            out.println(indent + "<?" + pi.getTarget() +
                               " " + pi.getData() + "?>");
            break;
        }
        case Node.ENTITY_REFERENCE_NODE: {        // Handle entities
            out.println(indent + "&" + node.getNodeName() + ";");
            break;
        }
        case Node.CDATA_SECTION_NODE: {           // Output CDATA sections
            CDATASection cdata = (CDATASection)node;
            // Careful! Don't put a CDATA section in the program itself!
            out.println(indent + "<" + "![CDATA[" + cdata.getData() +
                        "]]" + ">");
            break;
        }
        case Node.COMMENT_NODE: {                 // Comments
            Comment c = (Comment)node;
            out.println(indent + "<!--" + c.getData() + "-->");
            break;
        }
        default:   // Hopefully, this won't happen too much!
            System.err.println("Ignoring node: " + node.getClass().getName());
            break;
        }
    }

    // This method replaces reserved characters with entities.
    String fixup(String s) {
        StringBuffer sb = new StringBuffer();
        int len = s.length();
        for(int i = 0; i < len; i++) {
            char c = s.charAt(i);
            switch(c) {
            default: sb.append(c); break;
            case '<': sb.append("&lt;"); break;
            case '>': sb.append("&gt;"); break;
            case '&': sb.append("&amp;"); break;
            case '"': sb.append("&quot;"); break;
            case '\'': sb.append("&apos;"); break;
            }
        }
        return sb.toString();
    }
}

Traversing a Document with DOM Level 2

Example 19.5 is a listing of DOMTreeWalkerTreeModel.java, a class that demonstrates DOM tree traversal using the DOM Level 2 TreeWalker class. TreeWalker is part of the org.w3c.dom.traversal package. It allows you to traverse, or walk, a DOM tree using a simple API. More importantly, however, it lets you specify what type of nodes you want and automatically filters out all other nodes. It even allows you to provide a NodeFilter class that filters nodes based on any criteria you want.

The DOMTreeWalkerTreeModel implements the javax.swing.tree.TreeModel interface, which enables you to easily display a filtered DOM tree using a Swing JTree component. Figure 19.1 shows a filtered web.xml file being displayed in this way. What is interesting here is not the TreeModel methods themselves (refer to Chapter 10, Graphical User Interfaces for an explanation of TreeModel), but how the implementations of those methods use the TreeWalker API to traverse the DOM tree.

Figure 19.1: DOMTreeWalkerTreeModel display of a web.xml file

Figure 19.1

The main() method parses the XML document named on the command line, then creates a TreeWalker for the parse tree. The TreeWalker is configured to show all nodes except for comments and text nodes that contain only whitespace. Next, the main() method creates a DOMTreeWalkerTreeModel object for the TreeWalker. Finally, it creates a JTree component to display the tree described by the DOMTreeWalkerTreeModel.

Note that this example uses the Xerces parser because of its support for DOM Level 2 (which, at the time of this writing, is not supported by JAXP). Because the example uses Xerces, you must have the xerces.jar file in your classpath in order to compile and run the example. At the time of this writing, DOM Level 2 is reasonably stable but is not yet an official standard. If the TreeWalker API changes during the standardization process, it will probably break this example.

Example 19.5: DOMTreeWalkerTreeModel.java

package com.davidflanagan.examples.xml;
import org.w3c.dom.*;                // Core DOM classes
import org.w3c.dom.traversal.*;      // TreeWalker and related DOM classes
import org.apache.xerces.parsers.*;  // Apache Xerces parser classes
import org.xml.sax.*;                // Xerces DOM parser uses some SAX classes
import javax.swing.*;                // Swing classes 
import javax.swing.tree.*;           // TreeModel and related classes
import javax.swing.event.*;          // Tree-related event classes
import java.io.*;                    // For reading the input XML file

/**
 * This class implements the Swing TreeModel interface so that the DOM tree
 * returned by a TreeWalker can be displayed in a JTree component.
 **/
public class DOMTreeWalkerTreeModel implements TreeModel {
    TreeWalker walker;  // The TreeWalker we're modeling for JTree
    
    /** Create a TreeModel for the specified TreeWalker */
    public DOMTreeWalkerTreeModel(TreeWalker walker) { this.walker = walker; }

    /** 
     * Create a TreeModel for a TreeWalker that returns all nodes
     * in the specified document
     **/
    public DOMTreeWalkerTreeModel(Document document) {
        DocumentTraversal dt = (DocumentTraversal)document;
        walker = dt.createTreeWalker(document, NodeFilter.SHOW_ALL,null,false);
    }

    /** 
     * Create a TreeModel for a TreeWalker that returns the specified 
     * element and all of its descendant nodes.
     **/
    public DOMTreeWalkerTreeModel(Element element) {
        DocumentTraversal dt = (DocumentTraversal)element.getOwnerDocument();
        walker = dt.createTreeWalker(element, NodeFilter.SHOW_ALL, null,false);
    }

    // Return the root of the tree 
    public Object getRoot() { return walker.getRoot(); }
    
    // Is this node a leaf? (Leaf nodes are displayed differently by JTree)
    public boolean isLeaf(Object node) {
        walker.setCurrentNode((Node)node);   // Set current node
        Node child = walker.firstChild();    // Ask for a child
        return (child == null);              // Does it have any?
    }
    
    // How many children does this node have?
    public int getChildCount(Object node) {
        walker.setCurrentNode((Node)node);   // Set the current node
        // TreeWalker doesn't count children for us, so we count ourselves
        int numkids = 0;
        Node child = walker.firstChild();    // Start with the first child
        while(child != null) {               // Loop 'till there are no more
            numkids++;                       // Update the count
            child = walker.nextSibling();    // Get next child
        }
        return numkids;                      // This is the number of children
    }
    
    // Return the specified child of a parent node.
    public Object getChild(Object parent, int index) {
        walker.setCurrentNode((Node)parent);  // Set the current node
        // TreeWalker provides sequential access to children, not random
        // access, so we've got to loop through the kids one by one
        Node child = walker.firstChild();
        while(index-- > 0) child = walker.nextSibling();
        return child;
    }
    
    // Return the index of the child node in the parent node
    public int getIndexOfChild(Object parent, Object child) {
        walker.setCurrentNode((Node)parent);    // Set current node
        int index = 0;
        Node c = walker.firstChild();           // Start with first child
        while((c != child) && (c != null)) {    // Loop 'till we find a match
            index++;
            c = walker.nextSibling();           // Get the next child
        }
        return index;                           // Return matching position
    }
    
    // Only required for editable trees; unimplemented here.
    public void valueForPathChanged(TreePath path, Object newvalue) {}
    
    // This TreeModel never fires any events (since it is not editable)
    // so event listener registration methods are left unimplemented
    public void addTreeModelListener(TreeModelListener l) {}
    public void removeTreeModelListener(TreeModelListener l) {}

    /**
     * This main() method demonstrates the use of this class, the use of the
     * Xerces DOM parser, and the creation of a DOM Level 2 TreeWalker object.
     **/
    public static void main(String[] args) throws IOException, SAXException {
        // Obtain an instance of a Xerces parser to build a DOM tree.
        // Note that we are not using the JAXP API here, so this
        // code uses Apache Xerces APIs that are not standards
        DOMParser parser = new org.apache.xerces.parsers.DOMParser();

        // Get a java.io.Reader for the input XML file and 
        // wrap the input file in a SAX input source
        Reader in = new BufferedReader(new FileReader(args[0]));
        InputSource input = new org.xml.sax.InputSource(in);

        // Tell the Xerces parser to parse the input source
        parser.parse(input);

        // Ask the parser to give us our DOM Document.  Once we've got the DOM
        // tree, we don't have to use the Apache Xerces APIs any more; from
        // here on, we use the standard DOM APIs
        Document document = parser.getDocument();

        // If we're using a DOM Level 2 implementation, then our Document
        // object ought to implement DocumentTraversal 
        DocumentTraversal traversal = (DocumentTraversal)document;

        // For this demonstration, we create a NodeFilter that filters out
        // Text nodes containing only space; these just clutter up the tree
        NodeFilter filter = new NodeFilter() {
                public short acceptNode(Node n) {
                    if (n.getNodeType() == Node.TEXT_NODE) { 
                        // Use trim() to strip off leading and trailing space.
                        // If nothing is left, then reject the node
                        if (((Text)n).getData().trim().length() == 0)
                            return NodeFilter.FILTER_REJECT;
                    }
                    return NodeFilter.FILTER_ACCEPT;
                }
            };

        // This set of flags says to "show" all node types except comments
        int whatToShow = NodeFilter.SHOW_ALL & ~NodeFilter.SHOW_COMMENT;

        // Create a TreeWalker using the filter and the flags
        TreeWalker walker = traversal.createTreeWalker(document, whatToShow,
                                                       filter, false);

        // Instantiate a TreeModel and a JTree to display it
        JTree tree = new JTree(new DOMTreeWalkerTreeModel(walker));
        
        // Create a frame and a scrollpane to display the tree, and pop them up
        JFrame frame = new JFrame("DOMTreeWalkerTreeModel Demo");
        frame.getContentPane().add(new JScrollPane(tree));
        frame.setSize(500, 250);
        frame.setVisible(true);
    }
}

The JDOM API

Until now, this chapter has considered the official, standard ways of parsing and working with XML documents: DOM is a standard of the W3C, and SAX is a de facto standard by virtue of its nearly universal adoption. Both SAX and DOM were designed to be programming language-independent APIs, however. This generality means they can't take full advantage of the features of the Java language and platform, however. As I write this chapter, there is a new (still in beta release) but promising API targeted directly at Java programmers. As its name implies, JDOM is an XML document object model for Java. Like the DOM API, it creates a parse tree to represent an XML document. Unlike the DOM, however, the API is designed from the ground up for Java and is significantly easier to use than the DOM. JDOM is an open-source project initiated by Brett McLaughlin and Jason Hunter, who are the authors of the O'Reilly books Java and XML and Java Servlet Programming, respectively.

Example 19.6 shows how the JDOM API can be used to parse an XML document, to extract information from the resulting parse tree, to create new element nodes and add them to the parse tree, and, finally, to output the modified tree as an XML document. Compare this code to Example 19.3; the examples perform exactly the same task, but as you'll see, using the JDOM API makes the code simpler and cleaner. You should also notice that JDOM has its own built-in XMLOutputter class, obviating the need for the XMLDocumentWriter shown in Example 19.4.

Example 19.6: WebAppConfig2.java

package com.davidflanagan.examples.xml;
import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;

/**
 * This class is just like WebAppConfig, but it uses the JDOM (Beta 4) API
 * instead of the DOM and JAXP APIs
 **/
public class WebAppConfig2 {
    /** The main method creates and demonstrates a WebAppConfig2 object */
    public static void main(String[] args)
        throws IOException, JDOMException
    {
        // Create a new WebAppConfig object that represents the web.xml
        // file specified by the first command-line argument
        WebAppConfig2 config = new WebAppConfig2(new File(args[0]));

        // Query the tree for the class name associated with the servlet
        // name specified as the 2nd command-line argument
        System.out.println("Class for servlet " + args[1] + " is " +
                           config.getServletClass(args[1]));

        // Add a new servlet name-to-class mapping to the DOM tree
        config.addServlet("foo", "bar");

        // And write out an XML version of the DOM tree to standard out
        config.output(System.out);
    }

    /**
     * This field holds the parsed JDOM tree.  Note that this is a JDOM
     * Document, not a DOM Document.
     **/
    protected org.jdom.Document document;  

    /**
     * Read the specified File and parse it to create a JDOM tree
     **/
    public WebAppConfig2(File configfile) throws IOException, JDOMException {
        // JDOM can build JDOM trees from a variety of input sources.  One
        // of those input sources is a SAX parser.  
        SAXBuilder builder =
            new SAXBuilder("org.apache.xerces.parsers.SAXParser");
        // Parse the specified file and convert it to a JDOM document
        document = builder.build(configfile);
    }

    /**
     * This method looks for specific Element nodes in the JDOM tree in order
     * to figure out the classname associated with the specified servlet name
     **/
    public String getServletClass(String servletName) throws JDOMException {
        // Get the root element of the document.
        Element root = document.getRootElement();

        // Find all <servlet> elements in the document, and loop through them
        // to find one with the specified name.  Note the use of java.util.List
        // instead of org.w3c.dom.NodeList.
        List servlets = root.getChildren("servlet");
        for(Iterator i = servlets.iterator(); i.hasNext(); ) {
            Element servlet = (Element) i.next();
            // Get the text of the <servlet-name> tag within the <servlet> tag
            String name = servlet.getChild("servlet-name").getContent();
            if (name.equals(servletName)) {
                // If the names match, return the text of the <servlet-class>
                return servlet.getChild("servlet-class").getContent();
            }
        }
        return null;
    }

    /**
     * This method adds a new name-to-class mapping in in the form of
     * a <servlet> sub-tree to the document.
     **/
    public void addServlet(String servletName, String className)
        throws JDOMException
    {
        // Create the new Element that represents our new servlet
        Element newServletName = new Element("servlet-name");
        newServletName.setContent(servletName);
        Element newServletClass = new Element("servlet-class");
        newServletClass.setContent(className);
        Element newServlet = new Element("servlet");
        newServlet.addChild(newServletName);
        newServlet.addChild(newServletClass);

        // find the first <servlet> child in the document
        Element root = document.getRootElement();
        Element firstServlet = root.getChild("servlet");

        // Now insert our new servlet tag before the one we just found.
        Element parent = firstServlet.getParent();
        List children = parent.getChildren();
        children.add(children.indexOf(firstServlet), newServlet);
    }

    /**
     * Output the JDOM tree to the specified stream as an XML document.
     **/
    public void output(OutputStream out) throws IOException {
        // JDOM can output JDOM trees in a variety of ways (such as converting
        // them to DOM trees or SAX event streams).  Here we use an "outputter"
        // that converts a JDOM tree to an XML document
        XMLOutputter outputter = new XMLOutputter("  ",    // indentation
                                                  true);   // use newlines
        outputter.output(document, out);
    }
}

Compiling and Running the Example

In order to compile and run Example 19.6, you must download the JDOM distribution, which is freely available from http://www.jdom.org/. This example was developed using the Beta 4 release of JDOM. Because of the beta status of JDOM, I'm not going to try to give explicit build instructions here. You need to have the JDOM classes in your classpath to compile and run the example. Additionally, since the example relies on the Xerces SAX 2 parser, you need to have the Xerces JAR file in your classpath to run the example. Xerces is conveniently bundled with JDOM (at least in the Beta 4 distribution). Finally, note that JDOM is undergoing rapid development, and the API may change somewhat from the Beta 4 version used here. If so, you may need to modify the example to get it to compile and run.

Exercises

  1. Many of the examples in this chapter were designed to parse the web.xml files that configure web applications. If you use the Tomcat servlet container to run your servlets, you may know that Tomcat uses another XML file, server.xml, for server-level configuration information. In Tomcat 3.1, this file is located in the conf directory of the Tomcat distribution and contains a number of <Context> tags that use attributes to specify additional information about each web application. Write a program that uses a SAX parser (preferably SAX 2) to parse the server.xml file and output the values of the path and docBase attributes of each <Context> tag.

  2. Using a DOM parser instead of a SAX parser, write a program that behaves identically to the program you developed in Exercise 19-1.

  3. Rewrite the server.xml parser again, using the JDOM API this time.

  4. Write a Swing-based web application configuration program that can read web.xml files, allow the user to modify them, and then write out the modified version. The program should allow the user to add new servlets to the web application and edit existing servlets. For each servlet, it should allow the user to specify the servlet name, class, initialization parameters, and URL pattern.

  5. Design an XML grammar for representing a JavaBeans component and its property values. Write a class that can serialize an arbitrary bean to this XML format and deserialize, or recreate, a bean from the XML format. Use the Java Reflection API or the JavaBeans Introspector class to identify the properties of a bean. Assume that all properties of the bean are either primitive Java types, String objects, or other bean instances. (You may want to extend this list to include Font and Color objects as well.) Further assume that all bean classes define a no-argument constructor and all beans can be properly initialized by instantiating them and setting their public properties.

Back to: Java Examples in a Nutshell, 2nd Edition


O'Reilly Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies

© 2001, O'Reilly & Associates, Inc.
webmaster@oreilly.com