In this chapter,
we’ll continue our exploration of
the Java API by looking at many of the classes in the
java.io
package. Figure 10.1
shows the class hierarchy of the java.io
package.
We’ll start by looking at the stream classes in
java.io
; these classes are all subclasses of the
basic InputStream
,
OutputStream
, Reader
, and
Writer
classes. Then we’ll examine the
File
class and discuss how you can interact with
the filesystem using classes in java.io
. Finally,
we’ll take a quick look at the data compression classes
provided in java.util.zip
.
All fundamental I/O in Java is based on streams. A stream represents a flow of data, or a channel of communication with (at least conceptually) a writer at one end and a reader at the other. When you are working with terminal input and output, reading or writing files, or communicating through sockets in Java, you are using a stream of one type or another. So that you can see the forest without being distracted by the trees, we’ll start by summarizing the classes involved with the different types of streams:
-
InputStream/OutputStream
Abstract classes that define the basic functionality for reading or writing an unstructured sequence of bytes. All other byte streams in Java are built on top of the basic
InputStream
andOutputStream
.-
Reader/Writer
Abstract classes that define the basic functionality for reading or writing a sequence of character data, with support for Unicode. All other character streams in Java are built on top of
Reader
andWriter
.-
InputStreamReader/
OutputStreamWriter
“Bridge” classes that convert bytes to characters and vice versa. Remember: in Unicode, a character is not a byte!
-
DataInputStream/
DataOutputStream
Specialized stream filters that add the ability to read and write simple data types, such as numeric primitives and
String
objects, in a universal format.-
ObjectInputStream/ObjectOutputStream
Specialized stream filters that are capable of writing serialized Java objects and reconstructing them.
-
BufferedInputStream/BufferedOutputStream/BufferedReader/BufferedWriter
Specialized stream filters that add buffering for additional efficiency.
-
PrintWriter
A specialized character stream that makes it simple to print text.
-
PipedInputStream/PipedOutputStream/PipedReader/PipedWriter
“Double-ended” streams that normally occur in pairs. Data written into a
PipedOutputStream
orPipedWriter
is read from its correspondingPipedInputStream
orPipedReader
.-
FileInputStream/FileOutputStream/FileReader/FileWriter
Implementations of
InputStream
,OutputStream
,Reader
, andWriter
that read from and write to files on the local filesystem.
Streams in Java are one-way streets. The java.io
input and output classes represent the ends of a simple stream, as
shown in Figure 10.2. For bidirectional
conversations, we use one of each type of stream.
InputStream
and OutputStream
are abstract
classes that define the lowest-level
interface for all byte streams. They contain methods for reading or
writing an unstructured flow of
byte-level data. Because
these classes are abstract, you can’t create a generic input or
output stream. Java implements subclasses of these for activities
like reading from and writing to files and communicating with
sockets. Because all byte streams inherit the structure of
InputStream
or OutputStream
,
the various kinds of byte streams can be used interchangeably. A
method specifying an InputStream
as an argument
can, of course, accept any subclass of
InputStream
. Specialized types of streams can also
be layered to provide features, such as buffering, filtering, or
handling larger data types.
In Java 1.1, new classes based
around Reader
and
Writer
were added to the
java.io
package. Reader
and
Writer
are very much like
InputStream
and OutputStream
,
except that they deal with
characters instead of
bytes. As true character streams, these classes correctly handle
Unicode characters, which was not
always the case with the byte streams. However, some sort of bridge
is needed between these character streams and the byte streams of
physical devices like disks and networks.
InputStreamReader
and
OutputStreamWriter
are special classes that use an
encoding
scheme
to translate
between character and byte streams.
We’ll discuss all of the interesting stream types in this
section, with the exception of FileInputStream
,
FileOutputStream
, FileReader
,
and FileWriter
. We’ll postpone the
discussion of file streams until the next section, where we’ll
cover issues involved with accessing the filesystem in Java.
The prototypical
example of an
InputStream
object is the “standard
input” of a Java application. Like stdin
in
C or cin
in C++, this object reads data from the
program’s environment, which is usually a terminal window or a
command pipe. The
java.lang.System
class, a general repository for
system-related resources, provides a reference to standard input in
the static variable in
. System
also provides objects for standard output and standard error in the
out
and err
variables,
respectively. The following example shows the correspondence:
InputStream stdin = System.in; OutputStream stdout = System.out; OutputStream stderr = System.err;
This example hides the fact that
System.out
and System.err
aren’t really OutputStream
objects, but more
specialized and useful PrintStream
objects.
We’ll explain these later, but for now we can reference
out
and err
as
OutputStream
objects, since they are a kind of
OutputStream
as well.
We
can read a single byte at a time from standard input with the
InputStream
’s read( )
method. If you look closely at the API, you’ll see that the
read( )
method of the base
InputStream
class is an
abstract
method. What lies behind
System.in
is a particular implementation of
InputStream
—the subclass provides a real
implementation of the read( )
method.
try { int val = System.in.read( ); ... } catch ( IOException e ) { ... }
As is the convention in C, read( )
provides a byte
of information, but its return type is int
. A
return value of -1
indicates a normal end of
stream has been reached; you’ll need to test for this condition
when using the simple read( )
method. If an error
occurs during the read, an IOException
is
thrown.
All basic input and output stream commands can throw an
IOException
, so you should arrange to catch and
handle them appropriately.
To retrieve the value as a byte, perform a cast:
byte b = (byte) val;
Be sure to check for the end-of-stream condition before you perform the cast.
An overloaded form of read( )
fills a byte array
with as much data as possible up to the capacity of the array, and
returns the number of bytes read:
byte [] bity = new byte [1024]; int got = System.in.read( bity );
We can also
check the number of
bytes available for reading on an
InputStream
with the available( )
method. Using that information, we could create an array
of exactly the right size:
int waiting = System.in.available( ); if ( waiting > 0 ) { byte [] data = new byte [ waiting ]; System.in.read( data ); ... }
However, the reliability of this technique depends on the ability of the underlying stream implementation to detect how much data is arriving.
InputStream
provides the
skip( )
method
as a way of jumping over a number of bytes. Depending on the
implementation of the stream, skipping bytes may be more efficient
than reading them. The
close( )
method shuts
down the stream and frees up any associated system resources.
It’s a good idea to close a stream when you are done using it.
Some InputStream
and OutputStream
subclasses of
early versions of Java
included methods for reading and writing strings, but most of them
operated by assuming that a 16-bit Unicode character was equivalent
to an 8-bit byte in the stream. This works only for Latin-1 (ISO
8859-1) characters, so the character stream classes
Reader
and
Writer
were introduced in Java 1.1. Two
special classes,
InputStreamReader
and
OutputStreamWriter
, bridge the gap between the
world of character streams and the world of byte streams. These are
character streams that are wrapped around an underlying byte stream.
An encoding scheme is used to convert between bytes and characters.
An encoding scheme name can be specified in the constructor of
InputStreamReader
or
OutputStreamWriter
. Or the default constructor can
be used, which uses the system’s default encoding scheme. For
example, let’s parse a human-readable string from the standard
input into an integer. We’ll assume that the bytes coming from
System.in
use the system’s default encoding
scheme:
try { InputStreamReader converter = new InputStreamReader(System.in); BufferedReader in = new BufferedReader(converter); String text = in.readLine( ); int i = NumberFormat.getInstance().parse(text).intValue( ); } catch ( IOException e ) { } catch ( ParseException pe ) { }
First, we wrap an InputStreamReader
around
System.in
. This object converts the incoming bytes
of System.in
to characters using the default
encoding scheme. Then, we wrap a
BufferedReader
around
the InputStreamReader
.
BufferedReader
gives
us the readLine( )
method, which we can use to convert a full line of text
into a String
. The string is then parsed into an
integer using the techniques described in Chapter 9.
We could have programmed the previous example using only byte streams, and it would have worked for users in the United States, at least. So why go to the extra trouble of using character streams? Character streams were introduced in Java 1.1 to correctly support Unicode strings. Unicode was designed to support almost all of the written languages of the world. If you want to write a program that works in any part of the world, in any language, you definitely want to use streams that don’t mangle Unicode.
So how do you decide when you need a byte stream
(InputStream
or OutputStream
)
and when you need a character stream? If you want to read or write
character strings, use some variety of Reader
or
Writer
. Otherwise, a byte stream should suffice.
Let’s say, for example, that you want to read strings from a
file that was written by an earlier Java application. In this case,
you could simply create a FileReader
, which will
convert the bytes in the file to characters using the system’s
default encoding scheme. If you have a file in a specific encoding
scheme, you can create an InputStreamReader
with
the specified encoding scheme wrapped around a
FileInputStream
and read characters from it.
Another example comes from the Internet. Web servers serve files as
byte streams. If you want to read Unicode strings with a particular
encoding scheme from a file on the network, you’ll need an
appropriate InputStreamReader
wrapped around the
InputStream
of the web server’s socket.
What if we want to do more than read and
write a sequence of bytes or characters? We can use a
"filter”
stream, which is a type of InputStream
,
OutputStream
, Reader
, or
Writer
that wraps another stream and adds new
features. A filter stream takes the target stream as an argument in
its constructor and delegates calls to it after doing some additional
processing of its own. For example, you could construct a
BufferedInputStream
to wrap the system standard input:
InputStream bufferedIn = new BufferedInputStream( System.in );
The BufferedInputStream
is a type of filter stream
that reads ahead and buffers a certain amount of data. (We’ll
talk more about it later in this chapter.) The
BufferedInputSream
wraps an additional layer of
functionality around the underlying stream. Figure 10.3 shows this arrangment for a
DataInputStream
.
As you can see from the previous code snippet, the
BufferedInputStream
filter is a type of
InputStream
. Because filter streams are themselves
subclasses of the basic stream types, they can be used as arguments
to the construction of other filter streams. This allows filter
streams to be layered on top of on another to provide different
combinations of features. For example, we could first wrap our
System.in
with a
BufferedInputStream
and then wrap the
BufferedInputSream
with a
DataInputStream
for reading special data types.
There are four superclasses corresponding to the four types of filter
streams:
FilterInputStream
,
FilterOutputStream
, FilterReader
, and
FilterWriter
. The first two are for filtering byte
streams, and the last two are for filtering character streams. These
superclasses provide the basic machinery for a “no op”
filter (a filter that doesn’t do anything) by delegating all of
their method calls to their underlying stream. Real filter streams
subclass these and override various methods to add their additional
processing. We’ll make a filter stream a little later in this
chapter.
DataInputStream
and
DataOutputStream
are filter streams that let you
read or write strings and primitive data types that comprise more
than a single byte. DataInputStream
and
DataOutputStream
implement the
DataInput
and DataOutput
interfaces, respectively. These interfaces define the methods
required for streams that read and write strings and Java primitive
numeric and boolean types in a machine-independent manner.
You can construct a
DataInputStream
from an InputStream
and then use a method like
readDouble( )
to read a primitive data type:
DataInputStream dis = new DataInputStream( System.in ); double d = dis.readDouble( );
This example wraps the standard input stream in a
DataInputStream
and uses it to read a double
value. readDouble( )
reads bytes from the stream
and constructs a double
from them. The
DataInputStream
methods expect the bytes of
numeric data types to be in network byte order,
a standard that specifies that the high order bytes are sent first.
The
DataOutputStream
class provides write methods that
correspond to the read methods in DataInputStream
.
For example, writeInt( )
writes an integer in
binary format to the underlying output stream.
The readUTF( )
and
writeUTF( )
methods of
DataInputStream
and
DataOutputStream
read and write a Java
String
of Unicode characters using the UTF-8
“transformation format.” UTF-8 is an ASCII-compatible
encoding of Unicode characters commonly used for the transmission and
storage of Unicode text. This differs from the
Reader
and Writer
streams,
which can use arbitrary encodings and may not preserve all of the
Unicode characters.
We can use a DataInputStream
with any kind of
input stream, whether it be from a file, a socket, or standard input.
The same applies to using a DataOutputStream
, or,
for that matter, any other specialized streams in
java.io
.
The
BufferedInputStream
,
BufferedOutputStream
,
BufferedReader
, and
BufferedWriter
classes
add
a data buffer of a specified size to the stream path. A buffer can
increase efficiency by reducing the number of physical read or write
operations that correspond to read( )
or
write( )
method calls. You create a buffered
stream with an appropriate input or output stream and a buffer size.
(You can also wrap another stream around a buffered stream, so that
it benefits from the buffering.) Here’s a simple buffered input
stream:
BufferedInputStream bis = new BufferedInputStream(myInputStream, 4096); ... bis.read( );
In this example, we specify a buffer size of 4096 bytes. If we leave
off the size of the buffer in the constructor, a reasonably sized one
is chosen for us. On our first call to read( )
,
bis
tries to fill the entire 4096-byte buffer with
data. Thereafter, calls to read( )
retrieve data
from the buffer, which is refilled as necessary.
A
BufferedOutputStream
works in a similar way. Calls
to write( )
store the data in a buffer; data is actually written only when the
buffer fills up. You can also use the flush( )
method to wring out the contents of a
BufferedOutputStream
at any time. The
flush( )
method is actually a method of the
OutputStream
class itself. It’s important
because it allows you to be sure that all data in any underlying
streams and filter streams has been sent (before, for example, you
wait for a response).
Some input streams like
BufferedInputStream
support the ability to mark a
location in the data and later reset the stream to that position. The
mark( )
method sets the return point in the
stream. It takes an integer value that specifies the number of bytes
that can be read before the stream gives up and forgets about the
mark. The reset( )
method returns the stream
to the marked point; any data read after the call to mark( )
is read again.
This functionality is especially
useful when you are reading the stream in a parser. You may
occasionally fail to parse a structure and so must try something
else. In this situation, you can have your parser generate an error
(a homemade ParseException
) and then reset the
stream to the point before it began parsing the structure:
BufferedInputStream input; ... try { input.mark( MAX_DATA_STRUCTURE_SIZE ); return( parseDataStructure( input ) ); } catch ( ParseException e ) { input.reset( ); ... }
The BufferedReader
and
BufferedWriter
classes work just like their byte-based counterparts, but operate on
characters instead of bytes.
Another useful wrapper
stream is
java.io.PrintWriter
. This class provides a suite
of overloaded print( )
methods
that turn their arguments into strings and push them out the stream.
A complementary set of println( )
methods adds a newline to the end of the
strings. PrintWriter
is an unusual character
stream because it can wrap either an OutputStream
or another Writer
.
PrintWriter
is the more capable big brother of the
PrintStream
byte stream. The
System.out
and System.err
streams are PrintStream
objects; you have already
seen such streams strewn throughout this book:
System.out.print("Hello world...\n"); System.out.println("Hello world..."); System.out.println( "The answer is: " + 17 ); System.out.println( 3.14 );
PrintWriter
and PrintStream
have a strange, overlapping history. Early versions of Java did not
have the Reader
and Writer
classes and streams like PrintStream
, which must
of necessity convert bytes to characters simply made assumptions
about the character encoding. As of Java 1.1, the
PrintStream
class was enhanced to translate
characters to bytes using the system’s default encoding scheme.
For all new development, however, use a
PrintWriter
instead of a
PrintStream
. Because a
PrintWriter
can wrap an
OutputStream
, the two classes are more or less
interchangeable.
When you create a PrintWriter
object, you can pass
an additional boolean value to the constructor. If this value is
true
, the PrintWriter
automatically performs
a
flush( )
on the
underlying OutputStream
or
Writer
each time it sends a newline:
boolean autoFlush = true; PrintWriter p = new PrintWriter( myOutputStream, autoFlush );
When this technique is used with a buffered output stream, it corresponds to the behavior of terminals that send data line by line.
Unlike
methods in other stream classes, the methods of
PrintWriter
and PrintStream
do
not throw IOException
s. This makes life a lot
easier for printing text, which is a very common operation. Instead,
if we are interested, we can check for errors with the
checkError( )
method:
System.out.println( reallyLongString ); if ( System.out.checkError( ) ) // uh oh
Normally, our applications are directly
involved with one side of a given stream at a time.
PipedInputStream
and
PipedOutputStream
(or
PipedReader
and
PipedWriter
),
however, let us create two sides of a stream and connect them
together, as shown in Figure 10.4. This can be
used to provide a stream of communication between threads, for
example, or as a “loop-back” for testing.
To create a byte stream pipe, we use both a
PipedInputStream
and a
PipedOutputStream
. We can simply choose a side and
then construct the other side using the first as an argument:
PipedInputStream pin = new PipedInputStream( ); PipedOutputStream pout = new PipedOutputStream( pin );
Alternatively:
PipedOutputStream pout = new PipedOutputStream( ); PipedInputStream pin = new PipedInputStream( pout );
In each of these
examples, the effect is to produce an input stream,
pin
, and an output stream,
pout
, that are connected. Data written to
pout
can then be read by pin
.
It is also possible to create the PipedInputStream
and the PipedOutputStream
separately, and then
connect them with the connect( )
method.
We can do exactly the same thing in the character-based world, using
PipedReader
and PipedWriter
in
place of PipedInputStream
and
PipedOutputStream
.
Once the two ends of the pipe are
connected, use the two streams as you would other input and output
streams. You can use read( )
to read data from the
PipedInputStream
(or
PipedReader
) and write( )
to write data to the PipedOutputStream
(or
PipedWriter
). If the internal buffer of the pipe
fills up, the writer blocks and waits until space is available.
Conversely, if the pipe is empty, the reader blocks and waits until
some data is available.
One advantage to using piped streams
is that they provide stream functionality in our code, without
compelling us to build new, specialized streams. For example, we can
use pipes to create a simple logging facility for our application. We
can send messages to the logging facility through an ordinary
PrintWriter
, and then it can do whatever
processing or buffering is required before sending the messages off
to their ultimate location. Because we are dealing with string
messages, we use the character-based PipedReader
and PipedWriter
classes. The following example
shows the skeleton of our logging facility:
//file: LoggerDaemon.java import java.io.*; class LoggerDaemon extends Thread { PipedReader in = new PipedReader( ); LoggerDaemon( ) { start( ); } public void run( ) { BufferedReader bin = new BufferedReader( in ); String s; try { while ( (s = bin.readLine( )) != null ) { // process line of data // ... } } catch (IOException e ) { } } PrintWriter getWriter( ) throws IOException { return new PrintWriter( new PipedWriter( in ) ); } } class myApplication { public static void main ( String [] args ) throws IOException { PrintWriter out = new LoggerDaemon().getWriter( ); out.println("Application starting..."); // ... out.println("Warning: does not compute!"); // ... } }
LoggerDaemon
reads strings from its end of the
pipe, the PipedReader
named in
.
LoggerDaemon
also provides a method,
getWriter( )
, that returns a
PipedWriter
that is connected to its input stream.
To begin sending messages, we create a new
LoggerDaemon
and fetch the output stream.
In order to read strings with
the readLine( )
method,
LoggerDaemon
wraps a
BufferedReader
around its
PipedReader
. For convenience, it also presents its
output pipe as a PrintWriter
, rather than a simple
Writer
.
One advantage of implementing LoggerDaemon
with
pipes is that we can log messages as easily as we write
text to a terminal or any other stream. In other words, we can use
all our normal tools and techniques. Another advantage is that the
processing happens in another thread, so we can go about our business
while the processing takes place.
There is nothing stopping us from connecting more than two piped
streams. For example, we could chain multiple pipes together to perform a
series of filtering operations. Note that in this example, there is
nothing to prevent messages printed to the pipe from different
threads being mixed together. To do that we might have to create a
number of pipes, one for each thread, in the getWriter( )
method.
StringReader
is
another useful stream class; it essentially wraps stream
functionality around a String
. Here’s how to
use a StringReader
:
String data = "There once was a man from Nantucket..."; StringReader sr = new StringReader( data ); char T = (char)sr.read( ); char h = (char)sr.read( ); char e = (char)sr.read( );
Note that you will still have to catch
IOException
s thrown by some of the
StringReader
’s methods.
The StringReader
class is useful when you want to
read data in a String
as if it were coming from a
stream, such as a file, pipe, or socket. For example, suppose you
create a parser that expects to read tokens from a stream. But you
want to provide a method that also parses a big string. You can
easily add one using StringReader
.
Turning things around, the
StringWriter
class lets us write to a character
buffer through an output stream. The internal buffer grows as
necessary to accommodate the data. When we are done we can fetch the
contents of the buffer as a String
. In the
following example, we create a StringWriter
and
wrap it in a PrintWriter
for convenience:
StringWriter buffer = new StringWriter( ); PrintWriter out = new PrintWriter( buffer ); out.println("A moose once bit my sister."); out.println("No, really!"); String results = buffer.toString( );
First we print a
few lines to the output stream, to give it some data, then retrieve
the results as a string with the toString( )
method. Alternately, we could get the results as a
StringBuffer
object using the getBuffer( )
method.
The StringWriter
class is useful if you want to
capture the output of something that normally sends output to a
stream, such as a file or the console. A
PrintWriter
wrapped around a
StringWriter
is a viable alternative to using a
StringBuffer
to construct large strings piece by
piece.
Before we leave streams, let’s try our hand at making one of
our own. I mentioned earlier that specialized stream
wrappers
are built on top of the FilterInputStream
and
FilterOutputStream
classes. It’s quite easy
to create our own subclass of FilterInputStream
that can be wrapped around other streams to add new functionality.
The following example,
rot13InputStream
, performs a
rot13
(rotate by 13 letters)
operation on the bytes that it reads. rot13 is
a trivial obfuscation
algorithm that shifts
alphanumeric letters to make them not quite human-readable;
it’s cute because it’s symmetric. That is, to
“un-rot13” some text, simply rot13
it again. We’ll use the rot13InputStream
class again in the crypt
protocol handler example
in Appendix A, so we’ve put the class in the
learningjava.io
package to facilitate reuse.
Here’s our rot13InputStream
class:
//file: rot13InputStream.java package learningjava.io; import java.io.*; public class rot13InputStream extends FilterInputStream { public rot13InputStream ( InputStream i ) { super( i ); } public int read( ) throws IOException { return rot13( in.read( ) ); } private int rot13 ( int c ) { if ( (c >= 'A') && (c <= 'Z') ) c=(((c-'A')+13)%26)+'A'; if ( (c >= 'a') && (c <= 'z') ) c=(((c-'a')+13)%26)+'a'; return c; } }
The FilterInputStream
needs to be initialized with
an InputStream
; this is the stream to be filtered.
We provide an appropriate constructor for the
rot13InputStream
class and invoke the parent
constructor with a call to super( )
.
FilterInputStream
contains a protected instance variable,
in
, where it stores a reference to the specified
InputStream, making it available to the rest of our class.
The primary feature of a FilterInputStream
is that
it delegates its input tasks to the underlying
InputStream.
So, for instance, a call to
FilterInputStream
’s read( )
simply turns around and calls the
read( )
method of the underlying
InputStream
, to fetch a byte.
Filtering amounts to doing extra work (such as encryption) on the
data as it passes through. In our example, the read( )
method to fetches a byte from the underlying
InputStream
, in
, and then
performs the rot13 shift on the byte before
returning it. Note that the rot13( )
method shifts
alphabetic characters, while simply passing all other values,
including the end-of-stream value (-1
). Our
subclass is now a rot13 filter.
run( )
is the only InputStream
method that FilterInputStream
overrides. All other
normal functionality of an InputStream
, like
skip( )
and available( )
, is
unmodified, so calls to these methods are answered by the underlying
InputStream
.
Strictly speaking, rot13InputStream
works only on
an ASCII byte stream, since the underlying algorithm is based on the
Roman alphabet. A more generalized character-scrambling algorithm
would have to be based on FilterReader
to handle
16-bit Unicode classescorrectly. (Anyone
want to
try rot32768 ?)
Get Learning Java now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.