Run Java Programs that Process XML

Open source, command-line Java programs that process XML are abundant. This hack shows you how to use them.

The Java programming language (http://java.sun.com) has been a popular object-oriented language since it was unveiled by Sun in the mid-1990s. One key idea behind Java was that it made it possible to write and compile a program once, and then run it on any machine that supports a Java interpreter (“write once, run anywhere”). Note that it’s not a perfect programming language—I’ve heard Ted Ts’o (http://thunk.org/tytso/) say of Java, “Write once, run screaming.”

Nonetheless, Java is widespread and generally well liked, and you’ll find many command-line Java programs that can process XML in one way or another. A number of these programs appear in this book, so this hack walks you through how to use them.

Tip

This hack assumes that you know little to nothing about Java. If you are entirely new to Java, the information at http://java.sun.com/learning/new2java/ will also help you get up to speed quickly.

To get a Java program to run on your system, you need a Java virtual machine (VM), part of the Java runtime environment (JRE). One may already be on your system, but to get the latest JRE anyway, go to http://java.sun.com and find the link for the Java VM download. (There are alternatives to Sun’s VM, such as one offered on http://www.kaffe.org/, but I’m only going to talk about the Sun VM here.) In a few clicks, the new VM will be downloaded to your machine. You should then be able to go to a command prompt and type:

java -version

and get a response that looks something like the following:

java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

A more recent version may be available, but if you get a reply similar to this, you’re in business. If not, consult the installation instructions for Windows (http://java.sun.com/j2se/1.4.2/install-windows.html), the Mac (http://developer.apple.com/java/download.html), general Unix (http://java.sun.com/j2se/1.4.2/install-linux.html), or Solaris (http://java.sun.com/j2se/1.4.2/install-solaris.html).

JAR Files

In the file archive for this book (mentioned at the beginning of this chapter) is the Java archive or JAR file wf.jar . This JAR contains all the compiled Java classes from the XML Object Model or XOM (http://www.cafeconleche.org/XOM/). XOM is a simple, open source, tree-based application programming interface (API) for XML, written in Java. wf.jar also contains a little program called Wf.class that does a well-formedness check on an XML document. Type in this line:

java -jar wf.jar

The program echoes back usage information, letting you know that it expects a URL as an argument:

Usage: java -jar wf.jar URL

Try it with a file:

java -jar wf.jar time.xml

Because it is well-formed, time.xml is written to standard output. If it were not well-formed, Wf.class would display an error. Try this program with bad.xml, which contains a fatal well-formedness error:

java -jar wf.jar bad.xml

You should get an error like:

nu.xom.ParsingException: Expected "</hour>" to 
terminate element starting on line 5. at line 5, column -1.

Once again, try it with a web resource:

java -jar wf.jar http://www.wyeast.net/time.xml

If it finds no errors, the program will echo the file to standard output (the console).

Tip

Wf.class uses what is know as the JAR method. This technique relies on an entry (Main-Class: Wf) in the manifest file that is stored in the JAR. This points the Java interpreter to Wf.class, which contains the main() method, the entry point of a Java program.

The Java Classpath

Class files contain compiled bytecode that can be executed by the Java interpreter. The interpreter has to be able to “see” where the class files are in order to execute them. That’s why there’s such a thing as a classpath. You have to place the needed Java classes in the classpath so that the interpreter can see them.

The file Wf.class comes with the book’s file archive and should have been extracted into your working directory. Even when a class file is in the same directory where you are running the Java interpreter, you can’t execute it unless it’s in the classpath. In addition, the class file Wf.class also needs the XOM JAR to run.

Assuming that you have downloaded and stored xom.jar (renamed to xom.jar from a version available at writing time, xom-1.0d24.jar) in the working directory, place it directly in the classpath on the command line by using the -cp switch. On Windows, you do it like this:

java -cp .;xom.jar Wf worksheet.xml

Or on Unix, you do it like this:

java -cp .:xom.jar Wf worksheet.xml

The difference between the Windows and Unix commands is the colon versus the semicolon (: or ;). The current directory is represented by a period (.).

If a directory contains the actual classes, all you have to do is place the directory in the classpath; if the classes are contained in a JAR file, you have to place the path to the JAR file, including the JAR filename, in the classpath.

There are several other solutions for placing class files in the classpath. On Windows, you could place the JAR file in the classpath using this line at a command prompt or in autoexec.bat:

set CLASSPATH=%CLASSPATH%;".;C:\Hacks\examples\xom.jar"

This puts the current directory (.) and C:\Hacks\examples\xom.jar in the CLASSPATH environment variable. %CLASSPATH% prepends the current classpath to the new value of CLASSPATH.

The following command works on Unix (this line could be added to a shell setup file, such as .profile or .cshrc):

classpath="$CLASSPATH:/usr/mike/hacks/examples/xom.jar"

$CLASSPATH adds the current classpath to the new value of classpath. Another way to put classes in the classpath is to place a copy of the JAR file in the jre/lib directory where your JRE is installed. For example, wherever the JRE is installed, it will have the subdirectory jre/lib, such as C:\Program Files\Java\j2sdk1.4.2_03\jre\lib on Windows.

If you are using Windows XP, you can also set the CLASSPATH environment variable by choosing Start Control Panel System, clicking the Advanced tab, and then clicking the Environment Variables button (Figure 1-20). Select the existing CLASSPATH variable and add the classpath information to it. If the classpath variable does not already exist, you can create it by clicking the New button (Figure 1-21). You can select or add a CLASSPATH variable either for an individual user or, if you have administrator privileges, for the whole system (system variables).

System Properties dialog box on Windows XP

Figure 1-20. System Properties dialog box on Windows XP

Entering a new CLASSPATH variable in Windows XP

Figure 1-21. Entering a new CLASSPATH variable in Windows XP

Using a JAR File as an Executable on Windows 2000 or XP

With a little setup, you can use a JAR file that uses the JAR method—one that has the Main-class: field in its manifest file—like a normal executable file (.exe) on a Windows 2000 or XP command line. James Clark explained this technique on the RELAX NG mailing list a few years ago (http://lists.oasis-open.org/archives/relax-ng/200203/msg00037.html). This is how you do it.

In a command prompt window, go to the working directory where you extracted the file archive for the book, then type:

assoc .jar

This helps you find out what name is associated with the .jar extension, if any (to backtrack, write it down if it is already associated with some name). Now type this in:

assoc .jar=jarfile

This command associates the extension .jar with the name jarfile. Then enter:

ftype jarfile=C:\Program Files\Java\j2sdk1.4.2_03\bin\java -jar %1 %*

ftype displays or modifies the file types that are used with file extension associations. This command associates the name jarfile with java.exe using the replaceable parameters %1 and %* for the JAR filename and for the input files, respectively.

Next, set the path extension like this, which prepends the .jar extension to the current path extensions (%pathext%):

set pathext=.jar;%pathext%

Also make sure that the current directory is in the path by using this command:

set path=.;%path%

This prepends the path of the current directory (.) to the current path (%path%). Now, enter the following:

wf

This will execute Wf.class, which Main-Class: Wf points to in the manifest file. You will see this response:

Usage: java -jar wf.jar URL

Try this command with other JARs, such as jing.jar or trang.jar, to see what kind of response you get. To turn this feature off, just type:

assoc .jar=

This disassociates files with the .jar extension with the name jarfile, or any other name. If .jar was associated with another name (determined in the first step when you typed assoc.jar), you can reenter that name now.

Get XML Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.