Chapter 10. Input/Output Facilities

In this chapter, we’ll continue our exploration of the Java API by looking at many of the classes in the java.io package. Figure 10.1 shows the class hierarchy of the java.io package.

Figure 10-1. The java.io package

We’ll start by looking at the stream classes in java.io; these classes are all subclasses of the basic InputStream, OutputStream, Reader, and Writer classes. Then we’ll examine the File class and discuss how you can interact with the filesystem using classes in java.io. Finally, we’ll take a quick look at the data compression classes provided in java.util.zip.

Streams

All fundamental I/O in Java is based on streams. A stream represents a flow of data, or a channel of communication with (at least conceptually) a writer at one end and a reader at the other. When you are working with terminal input and output, reading or writing files, or communicating through sockets in Java, you are using a stream of one type or another. So that you can see the forest without being distracted by the trees, we’ll start by summarizing the classes involved with the different types of streams:

InputStream/OutputStream: Abstract classes that define the basic functionality for reading or writing an unstructured sequence of bytes. All other byte streams in Java are built on top of the basic InputStream and OutputStream.
Reader/Writer: Abstract classes that define the basic functionality for reading or writing a sequence of character data, with support for Unicode. All other character streams in Java are built on top of Reader and Writer.
InputStreamReader/ OutputStreamWriter: “Bridge” classes that convert bytes to characters and vice versa. Remember: in Unicode, a character is not a byte!
DataInputStream/ DataOutputStream: Specialized stream filters that add the ability to read and write simple data types, such as numeric primitives and String objects, in a universal format.
ObjectInputStream/ObjectOutputStream: Specialized stream filters that are capable of writing serialized Java objects and reconstructing them.
BufferedInputStream/BufferedOutputStream/BufferedReader/BufferedWriter: Specialized stream filters that add buffering for additional efficiency.
PrintWriter: A specialized character stream that makes it simple to print text.
PipedInputStream/PipedOutputStream/PipedReader/PipedWriter: “Double-ended” streams that normally occur in pairs. Data written into a PipedOutputStream or PipedWriter is read from its corresponding PipedInputStream or PipedReader.
FileInputStream/FileOutputStream/FileReader/FileWriter: Implementations of InputStream, OutputStream, Reader, and Writer that read from and write to files on the local filesystem.

Streams in Java are one-way streets. The java.io input and output classes represent the ends of a simple stream, as shown in Figure 10.2. For bidirectional conversations, we use one of each type of stream.

Figure 10-2. Basic input and output stream functionality

InputStream and OutputStream are abstract classes that define the lowest-level interface for all byte streams. They contain methods for reading or writing an unstructured flow of byte-level data. Because these classes are abstract, you can’t create a generic input or output stream. Java implements subclasses of these for activities like reading from and writing to files and communicating with sockets. Because all byte streams inherit the structure of InputStream or OutputStream, the various kinds of byte streams can be used interchangeably. A method specifying an InputStream as an argument can, of course, accept any subclass of InputStream. Specialized types of streams can also be layered to provide features, such as buffering, filtering, or handling larger data types.

In Java 1.1, new classes based around Reader and Writer were added to the java.io package. Reader and Writer are very much like InputStream and OutputStream, except that they deal with characters instead of bytes. As true character streams, these classes correctly handle Unicode characters, which was not always the case with the byte streams. However, some sort of bridge is needed between these character streams and the byte streams of physical devices like disks and networks. InputStreamReader and OutputStreamWriter are special classes that use an encoding scheme to translate between character and byte streams.

We’ll discuss all of the interesting stream types in this section, with the exception of FileInputStream, FileOutputStream, FileReader, and FileWriter. We’ll postpone the discussion of file streams until the next section, where we’ll cover issues involved with accessing the filesystem in Java.

Terminal I/O

The prototypical example of an InputStream object is the “standard input” of a Java application. Like stdin in C or cin in C++, this object reads data from the program’s environment, which is usually a terminal window or a command pipe. The java.lang.System class, a general repository for system-related resources, provides a reference to standard input in the static variable in. System also provides objects for standard output and standard error in the out and err variables, respectively. The following example shows the correspondence:

InputStream stdin = System.in;  
OutputStream stdout = System.out;  
OutputStream stderr = System.err;

This example hides the fact that System.out and System.err aren’t really OutputStream objects, but more specialized and useful PrintStream objects. We’ll explain these later, but for now we can reference out and err as OutputStream objects, since they are a kind of OutputStream as well.

We can read a single byte at a time from standard input with the InputStream’s read( ) method. If you look closely at the API, you’ll see that the read( ) method of the base InputStream class is an abstract method. What lies behind System.in is a particular implementation of InputStream—the subclass provides a real implementation of the read( ) method.

try {  
    int val = System.in.read( );  
    ...  
}  
catch ( IOException e ) {  
    ...  
}

As is the convention in C, read( ) provides a byte of information, but its return type is int. A return value of -1 indicates a normal end of stream has been reached; you’ll need to test for this condition when using the simple read( ) method. If an error occurs during the read, an IOException is thrown. All basic input and output stream commands can throw an IOException, so you should arrange to catch and handle them appropriately.

To retrieve the value as a byte, perform a cast:

byte b = (byte) val;

Be sure to check for the end-of-stream condition before you perform the cast.

An overloaded form of read( ) fills a byte array with as much data as possible up to the capacity of the array, and returns the number of bytes read:

byte [] bity = new byte [1024];  
int got = System.in.read( bity );

We can also check the number of bytes available for reading on an InputStream with the available( ) method. Using that information, we could create an array of exactly the right size:

int waiting = System.in.available( );  
if ( waiting > 0 ) {  
    byte [] data = new byte [ waiting ];  
    System.in.read( data );  
    ...  
}

However, the reliability of this technique depends on the ability of the underlying stream implementation to detect how much data is arriving.

InputStream provides the skip( ) method as a way of jumping over a number of bytes. Depending on the implementation of the stream, skipping bytes may be more efficient than reading them. The close( ) method shuts down the stream and frees up any associated system resources. It’s a good idea to close a stream when you are done using it.

Character Streams

Some InputStream and OutputStream subclasses of early versions of Java included methods for reading and writing strings, but most of them operated by assuming that a 16-bit Unicode character was equivalent to an 8-bit byte in the stream. This works only for Latin-1 (ISO 8859-1) characters, so the character stream classes Reader and Writer were introduced in Java 1.1. Two special classes, InputStreamReader and OutputStreamWriter, bridge the gap between the world of character streams and the world of byte streams. These are character streams that are wrapped around an underlying byte stream. An encoding scheme is used to convert between bytes and characters. An encoding scheme name can be specified in the constructor of InputStreamReader or OutputStreamWriter. Or the default constructor can be used, which uses the system’s default encoding scheme. For example, let’s parse a human-readable string from the standard input into an integer. We’ll assume that the bytes coming from System.in use the system’s default encoding scheme:

try { 
    InputStreamReader converter = new InputStreamReader(System.in);
    BufferedReader in = new BufferedReader(converter); 
     
    String text = in.readLine( ); 
    int i = NumberFormat.getInstance().parse(text).intValue( ); 
}  
catch ( IOException e ) { } 
catch ( ParseException pe ) { }

First, we wrap an InputStreamReader around System.in. This object converts the incoming bytes of System.in to characters using the default encoding scheme. Then, we wrap a BufferedReader around the InputStreamReader. BufferedReader gives us the readLine( ) method, which we can use to convert a full line of text into a String. The string is then parsed into an integer using the techniques described in Chapter 9.

We could have programmed the previous example using only byte streams, and it would have worked for users in the United States, at least. So why go to the extra trouble of using character streams? Character streams were introduced in Java 1.1 to correctly support Unicode strings. Unicode was designed to support almost all of the written languages of the world. If you want to write a program that works in any part of the world, in any language, you definitely want to use streams that don’t mangle Unicode.

So how do you decide when you need a byte stream (InputStream or OutputStream) and when you need a character stream? If you want to read or write character strings, use some variety of Reader or Writer. Otherwise, a byte stream should suffice. Let’s say, for example, that you want to read strings from a file that was written by an earlier Java application. In this case, you could simply create a FileReader, which will convert the bytes in the file to characters using the system’s default encoding scheme. If you have a file in a specific encoding scheme, you can create an InputStreamReader with the specified encoding scheme wrapped around a FileInputStream and read characters from it.

Another example comes from the Internet. Web servers serve files as byte streams. If you want to read Unicode strings with a particular encoding scheme from a file on the network, you’ll need an appropriate InputStreamReader wrapped around the InputStream of the web server’s socket.

Stream Wrappers

What if we want to do more than read and write a sequence of bytes or characters? We can use a "filter” stream, which is a type of InputStream, OutputStream, Reader, or Writer that wraps another stream and adds new features. A filter stream takes the target stream as an argument in its constructor and delegates calls to it after doing some additional processing of its own. For example, you could construct a BufferedInputStream to wrap the system standard input:

InputStream bufferedIn = new BufferedInputStream( System.in );

The BufferedInputStream is a type of filter stream that reads ahead and buffers a certain amount of data. (We’ll talk more about it later in this chapter.) The BufferedInputSream wraps an additional layer of functionality around the underlying stream. Figure 10.3 shows this arrangment for a DataInputStream .

As you can see from the previous code snippet, the BufferedInputStream filter is a type of InputStream. Because filter streams are themselves subclasses of the basic stream types, they can be used as arguments to the construction of other filter streams. This allows filter streams to be layered on top of on another to provide different combinations of features. For example, we could first wrap our System.in with a BufferedInputStream and then wrap the BufferedInputSream with a DataInputStream for reading special data types.

There are four superclasses corresponding to the four types of filter streams: FilterInputStream , FilterOutputStream , FilterReader, and FilterWriter. The first two are for filtering byte streams, and the last two are for filtering character streams. These superclasses provide the basic machinery for a “no op” filter (a filter that doesn’t do anything) by delegating all of their method calls to their underlying stream. Real filter streams subclass these and override various methods to add their additional processing. We’ll make a filter stream a little later in this chapter.

Figure 10-3. Layered streams

Data streams

DataInputStream and DataOutputStream are filter streams that let you read or write strings and primitive data types that comprise more than a single byte. DataInputStream and DataOutputStream implement the DataInput and DataOutput interfaces, respectively. These interfaces define the methods required for streams that read and write strings and Java primitive numeric and boolean types in a machine-independent manner.

You can construct a DataInputStream from an InputStream and then use a method like readDouble( ) to read a primitive data type:

DataInputStream dis = new DataInputStream( System.in );  
double d = dis.readDouble( );

This example wraps the standard input stream in a DataInputStream and uses it to read a double value. readDouble( ) reads bytes from the stream and constructs a double from them. The DataInputStream methods expect the bytes of numeric data types to be in network byte order, a standard that specifies that the high order bytes are sent first.

The DataOutputStream class provides write methods that correspond to the read methods in DataInputStream. For example, writeInt( ) writes an integer in binary format to the underlying output stream.

The readUTF( ) and writeUTF( ) methods of DataInputStream and DataOutputStream read and write a Java String of Unicode characters using the UTF-8 “transformation format.” UTF-8 is an ASCII-compatible encoding of Unicode characters commonly used for the transmission and storage of Unicode text. This differs from the Reader and Writer streams, which can use arbitrary encodings and may not preserve all of the Unicode characters.

We can use a DataInputStream with any kind of input stream, whether it be from a file, a socket, or standard input. The same applies to using a DataOutputStream, or, for that matter, any other specialized streams in java.io.

Buffered streams

The BufferedInputStream, BufferedOutputStream, BufferedReader, and BufferedWriter classes add a data buffer of a specified size to the stream path. A buffer can increase efficiency by reducing the number of physical read or write operations that correspond to read( ) or write( ) method calls. You create a buffered stream with an appropriate input or output stream and a buffer size. (You can also wrap another stream around a buffered stream, so that it benefits from the buffering.) Here’s a simple buffered input stream:

BufferedInputStream bis =
  new BufferedInputStream(myInputStream, 4096);
... 
bis.read( );

In this example, we specify a buffer size of 4096 bytes. If we leave off the size of the buffer in the constructor, a reasonably sized one is chosen for us. On our first call to read( ), bis tries to fill the entire 4096-byte buffer with data. Thereafter, calls to read( ) retrieve data from the buffer, which is refilled as necessary.

A BufferedOutputStream works in a similar way. Calls to write( ) store the data in a buffer; data is actually written only when the buffer fills up. You can also use the flush( ) method to wring out the contents of a BufferedOutputStream at any time. The flush( ) method is actually a method of the OutputStream class itself. It’s important because it allows you to be sure that all data in any underlying streams and filter streams has been sent (before, for example, you wait for a response).

Some input streams like BufferedInputStream support the ability to mark a location in the data and later reset the stream to that position. The mark( ) method sets the return point in the stream. It takes an integer value that specifies the number of bytes that can be read before the stream gives up and forgets about the mark. The reset( ) method returns the stream to the marked point; any data read after the call to mark( ) is read again.

This functionality is especially useful when you are reading the stream in a parser. You may occasionally fail to parse a structure and so must try something else. In this situation, you can have your parser generate an error (a homemade ParseException) and then reset the stream to the point before it began parsing the structure:

BufferedInputStream input;  
...  
try {  
    input.mark( MAX_DATA_STRUCTURE_SIZE );  
    return( parseDataStructure( input ) );  
}  
catch ( ParseException e ) {  
    input.reset( );  
    ...  
}

The BufferedReader and BufferedWriter classes work just like their byte-based counterparts, but operate on characters instead of bytes.

Print streams

Another useful wrapper stream is java.io.PrintWriter. This class provides a suite of overloaded print( ) methods that turn their arguments into strings and push them out the stream. A complementary set of println( ) methods adds a newline to the end of the strings. PrintWriter is an unusual character stream because it can wrap either an OutputStream or another Writer.

PrintWriter is the more capable big brother of the PrintStream byte stream. The System.out and System.err streams are PrintStream objects; you have already seen such streams strewn throughout this book:

System.out.print("Hello world...\n");  
System.out.println("Hello world...");  
System.out.println( "The answer is: " + 17 );  
System.out.println( 3.14 );

PrintWriter and PrintStream have a strange, overlapping history. Early versions of Java did not have the Reader and Writer classes and streams like PrintStream, which must of necessity convert bytes to characters simply made assumptions about the character encoding. As of Java 1.1, the PrintStream class was enhanced to translate characters to bytes using the system’s default encoding scheme. For all new development, however, use a PrintWriter instead of a PrintStream. Because a PrintWriter can wrap an OutputStream, the two classes are more or less interchangeable.

When you create a PrintWriter object, you can pass an additional boolean value to the constructor. If this value is true, the PrintWriter automatically performs a flush( ) on the underlying OutputStream or Writer each time it sends a newline:

boolean autoFlush = true;  
PrintWriter p = new PrintWriter( myOutputStream, autoFlush );

When this technique is used with a buffered output stream, it corresponds to the behavior of terminals that send data line by line.

Unlike methods in other stream classes, the methods of PrintWriter and PrintStream do not throw IOExceptions. This makes life a lot easier for printing text, which is a very common operation. Instead, if we are interested, we can check for errors with the checkError( ) method:

System.out.println( reallyLongString );  
if ( System.out.checkError( ) )                // uh oh

Pipes

Normally, our applications are directly involved with one side of a given stream at a time. PipedInputStream and PipedOutputStream (or PipedReader and PipedWriter ), however, let us create two sides of a stream and connect them together, as shown in Figure 10.4. This can be used to provide a stream of communication between threads, for example, or as a “loop-back” for testing.

Figure 10-4. Piped streams

To create a byte stream pipe, we use both a PipedInputStream and a PipedOutputStream. We can simply choose a side and then construct the other side using the first as an argument:

PipedInputStream pin = new PipedInputStream( );  
PipedOutputStream pout = new PipedOutputStream( pin );

Alternatively:

PipedOutputStream pout = new PipedOutputStream( );  
PipedInputStream pin = new PipedInputStream( pout );

In each of these examples, the effect is to produce an input stream, pin, and an output stream, pout, that are connected. Data written to pout can then be read by pin. It is also possible to create the PipedInputStream and the PipedOutputStream separately, and then connect them with the connect( ) method.

We can do exactly the same thing in the character-based world, using PipedReader and PipedWriter in place of PipedInputStream and PipedOutputStream.

Once the two ends of the pipe are connected, use the two streams as you would other input and output streams. You can use read( ) to read data from the PipedInputStream (or PipedReader) and write( ) to write data to the PipedOutputStream (or PipedWriter). If the internal buffer of the pipe fills up, the writer blocks and waits until space is available. Conversely, if the pipe is empty, the reader blocks and waits until some data is available.

One advantage to using piped streams is that they provide stream functionality in our code, without compelling us to build new, specialized streams. For example, we can use pipes to create a simple logging facility for our application. We can send messages to the logging facility through an ordinary PrintWriter, and then it can do whatever processing or buffering is required before sending the messages off to their ultimate location. Because we are dealing with string messages, we use the character-based PipedReader and PipedWriter classes. The following example shows the skeleton of our logging facility:

//file: LoggerDaemon.java
import java.io.*;  
  
class LoggerDaemon extends Thread {  
    PipedReader in = new PipedReader( );   
  
    LoggerDaemon( ) {  
        start( );  
    }  
  
    public void run( ) {  
        BufferedReader bin = new BufferedReader( in );  
        String s;  
   
        try {  
           while ( (s = bin.readLine( )) != null ) {  
                // process line of data  
                // ...  
            }  
        }   
        catch (IOException e ) { }  
    }  
  
    PrintWriter getWriter( ) throws IOException {  
        return new PrintWriter( new PipedWriter( in ) );  
    }  
}  
  
class myApplication {  
    public static void main ( String [] args ) throws IOException {
        PrintWriter out = new LoggerDaemon().getWriter( );  

        out.println("Application starting...");  
        // ...  
        out.println("Warning: does not compute!");  
        // ...  
    }  
}

LoggerDaemon reads strings from its end of the pipe, the PipedReader named in. LoggerDaemon also provides a method, getWriter( ), that returns a PipedWriter that is connected to its input stream. To begin sending messages, we create a new LoggerDaemon and fetch the output stream.

In order to read strings with the readLine( ) method, LoggerDaemon wraps a BufferedReader around its PipedReader. For convenience, it also presents its output pipe as a PrintWriter, rather than a simple Writer.

One advantage of implementing LoggerDaemon with pipes is that we can log messages as easily as we write text to a terminal or any other stream. In other words, we can use all our normal tools and techniques. Another advantage is that the processing happens in another thread, so we can go about our business while the processing takes place.

There is nothing stopping us from connecting more than two piped streams. For example, we could chain multiple pipes together to perform a series of filtering operations. Note that in this example, there is nothing to prevent messages printed to the pipe from different threads being mixed together. To do that we might have to create a number of pipes, one for each thread, in the getWriter( ) method.

Strings to Streams and Back

StringReader is another useful stream class; it essentially wraps stream functionality around a String. Here’s how to use a StringReader:

String data = "There once was a man from Nantucket...";  
StringReader sr = new StringReader( data );  
  
char T = (char)sr.read( );  
char h = (char)sr.read( );  
char e = (char)sr.read( );

Note that you will still have to catch IOExceptions thrown by some of the StringReader’s methods.

The StringReader class is useful when you want to read data in a String as if it were coming from a stream, such as a file, pipe, or socket. For example, suppose you create a parser that expects to read tokens from a stream. But you want to provide a method that also parses a big string. You can easily add one using StringReader.

Turning things around, the StringWriter class lets us write to a character buffer through an output stream. The internal buffer grows as necessary to accommodate the data. When we are done we can fetch the contents of the buffer as a String. In the following example, we create a StringWriter and wrap it in a PrintWriter for convenience:

StringWriter buffer = new StringWriter( );  
PrintWriter out = new PrintWriter( buffer );  
  
out.println("A moose once bit my sister.");  
out.println("No, really!");  

String results = buffer.toString( );

First we print a few lines to the output stream, to give it some data, then retrieve the results as a string with the toString( ) method. Alternately, we could get the results as a StringBuffer object using the getBuffer( ) method.

The StringWriter class is useful if you want to capture the output of something that normally sends output to a stream, such as a file or the console. A PrintWriter wrapped around a StringWriter is a viable alternative to using a StringBuffer to construct large strings piece by piece.

The rot13InputStream Class

Before we leave streams, let’s try our hand at making one of our own. I mentioned earlier that specialized stream wrappers are built on top of the FilterInputStream and FilterOutputStream classes. It’s quite easy to create our own subclass of FilterInputStream that can be wrapped around other streams to add new functionality.

The following example, rot13InputStream, performs a rot13 (rotate by 13 letters) operation on the bytes that it reads. rot13 is a trivial obfuscation algorithm that shifts alphanumeric letters to make them not quite human-readable; it’s cute because it’s symmetric. That is, to “un-rot13” some text, simply rot13 it again. We’ll use the rot13InputStream class again in the crypt protocol handler example in Appendix A, so we’ve put the class in the learningjava.io package to facilitate reuse. Here’s our rot13InputStream class:

//file: rot13InputStream.java
package learningjava.io;
import java.io.*;  

public class rot13InputStream extends FilterInputStream {  

    public rot13InputStream ( InputStream i ) {  
        super( i );  
    }  

    public int read( ) throws IOException {  
        return rot13( in.read( ) );  
    }  
 
    private int rot13 ( int c ) { 
        if ( (c >= 'A') && (c <= 'Z') ) 
            c=(((c-'A')+13)%26)+'A'; 
        if ( (c >= 'a') && (c <= 'z') ) 
            c=(((c-'a')+13)%26)+'a'; 
        return c; 
    } 
}

The FilterInputStream needs to be initialized with an InputStream; this is the stream to be filtered. We provide an appropriate constructor for the rot13InputStream class and invoke the parent constructor with a call to super( ). FilterInputStream contains a protected instance variable, in, where it stores a reference to the specified InputStream, making it available to the rest of our class.

The primary feature of a FilterInputStream is that it delegates its input tasks to the underlying InputStream. So, for instance, a call to FilterInputStream’s read( ) simply turns around and calls the read( ) method of the underlying InputStream, to fetch a byte.

Filtering amounts to doing extra work (such as encryption) on the data as it passes through. In our example, the read( ) method to fetches a byte from the underlying InputStream, in, and then performs the rot13 shift on the byte before returning it. Note that the rot13( ) method shifts alphabetic characters, while simply passing all other values, including the end-of-stream value (-1). Our subclass is now a rot13 filter.

run( ) is the only InputStream method that FilterInputStream overrides. All other normal functionality of an InputStream, like skip( ) and available( ), is unmodified, so calls to these methods are answered by the underlying InputStream.

Strictly speaking, rot13InputStream works only on an ASCII byte stream, since the underlying algorithm is based on the Roman alphabet. A more generalized character-scrambling algorithm would have to be based on FilterReader to handle 16-bit Unicode classescorrectly. (Anyone want to try rot32768 ?)

Get Learning Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Learning Java by Jonathan Knudsen, Patrick Niemeyer