Chapter 4. Streams
A large part of what network programs do is simple input and output: moving bytes from one system to another. Bytes are bytes; to a large extent, reading data a server sends you is not all that different from reading a file. Sending text to a client is not that different from writing a file. However, input and output (I/O) in Java is organized differently than it is in most other languages, such as Fortran, C, and C++. Consequently, we’ll take a few pages to summarize Java’s unique approach to I/O.
I/O in Java is built on streams.
Input streams read data; output streams write data. Different stream classes, like
java.io.FileInputStream
and sun.net.TelnetOutputStream
, read and write
particular sources of data. However, all output streams have the same
basic methods to write data and all input streams use the same basic
methods to read data. After a stream is created, you can often ignore the
details of exactly what it is you’re reading or writing.
Filter streams can be chained to either an input stream or
an output stream. Filters can modify the data as it’s read or written—for
instance, by encrypting or compressing it—or they can simply provide
additional methods for converting the data that’s read or written into
other formats. For instance, the java.io.DataOutputStream
class provides a method
that converts an int
to four bytes and
writes those bytes onto its underlying output stream.
Readers and writers can be chained to input and output streams to allow programs to read and write text (that is, characters) rather than bytes. Used properly, readers and writers can handle a wide variety of character encodings, including multibyte character sets such as SJIS and UTF-8.
Streams are synchronous; that is, when a program (really, a thread) asks a stream to read or write a piece of data, it waits for the data to be read or written before it does anything else. Java 1.4 and later also support non-blocking I/O using channels and buffers. Non-blocking I/O is a little more complicated, but much faster in some high-volume applications, such as web servers. Normally, the basic stream model is all you need and all you should use for clients. Since channels and buffers depend on streams, we’ll start with streams and clients and later discuss non-blocking I/O for use with servers in Chapter 12.
Output Streams
Java’s basic output class is java.io.OutputStream
:
public abstract class OutputStream
This class provides the fundamental methods needed to write data. These are:
public abstract void write(int b) throws IOException public void write(byte[] data) throws IOException public void write(byte[] data, int offset, int length) throws IOException public void flush( ) throws IOException public void close( ) throws IOException
Subclasses of OutputStream
use
these methods to write data onto particular media. For instance, a
FileOutputStream
uses these methods
to write data into a file. A TelnetOutputStream
uses these methods to write
data onto a network connection. A ByteArrayOutputStream
uses these methods to
write data into an expandable byte array. But whichever medium you’re
writing to, you mostly use only these same five methods. Sometimes you
may not even know exactly what kind of stream you’re writing onto. For
instance, you won’t find TelnetOutputStream
in the Java class library
documentation. It’s deliberately hidden inside the sun
packages. It’s returned by various methods
in various classes in java.net
, like
the getOutputStream()
method of
java.net.Socket
. However, these
methods are declared to return only OutputStream
, not the more specific subclass
TelnetOutputStream
. That’s the power
of polymorphism. If you know how to use the superclass, you know how to
use all the subclasses, too.
OutputStream
’s fundamental
method is write(int
b)
. This method takes an integer from 0 to 255
as an argument and writes the corresponding byte to the output stream.
This method is declared abstract because subclasses need to change it to
handle their particular medium. For instance, a ByteArrayOutputStream
can implement this
method with pure Java code that copies the byte into its array. However,
a FileOutputStream
will need to use
native code that understands how to write data in files on the host
platform.
Take note that although this method takes an int
as an argument, it actually writes an
unsigned byte. Java doesn’t have an unsigned byte data type, so an
int
has to be used here instead. The
only real difference between an unsigned byte and a signed byte is the
interpretation. They’re both made up of eight bits, and when you write
an int
onto a network connection
using write(int
b)
, only eight bits are placed on the wire. If
an int
outside the range 0-255 is
passed to write(int
b)
, the least significant byte of the number
is written and the remaining three bytes are ignored. (This is the
effect of casting an int
to a
byte
.) On rare occasions, however,
you may find a buggy third-party class that does something different,
such as throwing an IllegalArgumentException
or always writing
255, so it’s best not to rely on this behavior, if possible.
For example, the character generator protocol defines a server that sends out ASCII text. The most popular variation of this protocol sends 72-character lines containing printable ASCII characters. (The printable ASCII characters are those between 33 and 126 inclusive that exclude the various whitespace and control characters.) The first line contains characters 33 through 104, sorted. The second line contains characters 34 through 105. The third line contains characters 35 through 106. This continues through line 29, which contains characters 55 through 126. At that point, the characters wrap around so that line 30 contains characters 56 through 126 followed by character 33 again. Lines are terminated with a carriage return (ASCII 13) and a linefeed (ASCII 10). The output looks like this:
!"#$%&'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh "#$%&'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghi #$%&'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghij $%&'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijk %&'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijkl &'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklm '( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
Since ASCII is a 7-bit character set, each character is sent as a
single byte. Consequently, this protocol is straightforward to implement
using the basic write( )
methods, as
the next code fragment demonstrates:
public static void generateCharacters(OutputStream out) throws IOException { int firstPrintableCharacter = 33; int numberOfPrintableCharacters = 94; int numberOfCharactersPerLine = 72; int start = firstPrintableCharacter; while (true) { /* infinite loop */ for (int i = start; i < start+numberOfCharactersPerLine; i++) { out.write(( (i-firstPrintableCharacter) % numberOfPrintableCharacters) + firstPrintableCharacter); } out.write('\r'); // carriage return out.write('\n'); // linefeed start = ((start+1) - firstPrintableCharacter) % numberOfPrintableCharacters + firstPrintableCharacter; }
The character generator server class (the exact details of which
will have to wait until we discuss server sockets in Chapter 10) passes an OutputStream
named out
to the generateCharacters( )
method. Bytes are written onto out
one at a time. These bytes are given as
integers in a rotating sequence from 33 to 126. Most of the arithmetic
here is to make the loop rotate in that range. After each 72 character
chunk is written, a carriage return and a linefeed are written onto the
output stream. The next start character is calculated and the loop
repeats. The entire method is declared to throw IOException
. That’s important because the
character generator server will terminate only when the client closes
the connection. The Java code will see this as an IOException
.
Writing a single byte at a time is often inefficient. For example,
every TCP segment that goes out your Ethernet card contains at least 40
bytes of overhead for routing and error correction. If each byte is sent
by itself, you may be stuffing the network with 41 times more data than
you think you are! Consequently, most TCP/IP implementations buffer data to some extent. That is, they accumulate bytes
in memory and send them to their eventual destination only when a
certain number have accumulated or a certain amount of time has passed.
However, if you have more than one byte ready to go, it’s not a bad idea
to send them all at once. Using write(byte[]
data)
or write(byte[]
data, int
offset, int
length)
is normally much faster than writing all the
components of the data
array one at a
time. For instance, here’s an implementation of the generateCharacters()
method that sends a line at a time by packing a complete
line into a byte array:
public static void generateCharacters(OutputStream out) throws IOException { int firstPrintableCharacter = 33; int numberOfPrintableCharacters = 94; int numberOfCharactersPerLine = 72; int start = firstPrintableCharacter; byte[] line = new byte[numberOfCharactersPerLine+2]; // the +2 is for the carriage return and linefeed while (true) { /* infinite loop */ for (int i = start; i < start+numberOfCharactersPerLine; i++) { line[i-start] = (byte) ((i-firstPrintableCharacter) % numberOfPrintableCharacters + firstPrintableCharacter); } line[72] = (byte) '\r'; // carriage return line[73] = (byte) '\n'; // line feed out.write(line); start = ((start+1)-firstPrintableCharacter) % numberOfPrintableCharacters + firstPrintableCharacter; } }
The algorithm for calculating which bytes to write when is the
same as for the previous implementation. The crucial difference is that
the bytes are packed into a byte array before being written onto the
network. Also, notice that the int
result of the calculation must be cast to a byte
before being stored in the array. This
wasn’t necessary in the previous implementation because the single byte
write()
method is declared to take an
int
as an argument.
Streams can also be buffered in software, directly in the
Java code as well as in the network hardware. Typically, this is
accomplished by chaining a BufferedOutputStream
or a BufferedWriter
to the underlying stream, a
technique we’ll explore shortly. Consequently, if you are done writing
data, it’s important to flush the output stream. For example, suppose
you’ve written a 300-byte request to an HTTP 1.1 server that uses HTTP
Keep-Alive. You generally want to wait for a response before sending any
more data. However, if the output stream has a 1,024-byte buffer, the
stream may be waiting for more data to arrive before it sends the data
out of its buffer. No more data will be written onto the stream until
the server response arrives, but the response is never going to arrive
because the request hasn’t been sent yet! The buffered stream won’t send
the data to the server until it gets more data from the underlying
stream, but the underlying stream won’t send more data until it gets
data from the server, which won’t send data until it gets the data
that’s stuck in the buffer! Figure
4-1 illustrates this Catch-22. The flush()
method breaks the deadlock by forcing the buffered stream
to send its data even if the buffer isn’t yet full.
It’s important to flush your streams whether you think you need to
or not. Depending on how you got hold of a reference to the stream, you
may or may not know whether it’s buffered. (For instance, System.out
is buffered whether you want it to
be or not.) If flushing isn’t necessary for a particular stream, it’s a
very low cost operation. However, if it is necessary, it’s very
necessary. Failing to flush when you need to can lead to
unpredictable, unrepeatable program hangs that are extremely hard to
diagnose if you don’t have a good idea of what the problem is in the
first place. As a corollary to all this, you should flush all streams
immediately before you close them. Otherwise, data left in the buffer
when the stream is closed may get lost.
Finally, when you’re done with a stream, close it by
invoking its close( )
method. This
releases any resources associated with the stream, such as file handles
or ports. Once an output stream has been closed, further writes to it
throw IOException
s. However, some
kinds of streams may still allow you to do things with the object. For
instance, a closed ByteArrayOutputStream
can still be converted
to an actual byte array and a closed DigestOutputStream
can still return its
digest.
Input Streams
Java’s basic input class is java.io.InputStream
:
public abstract class InputStream
This class provides the fundamental methods needed to read data as raw bytes. These are:
public abstract int read( ) throws IOException public int read(byte[] input) throws IOException public int read(byte[] input, int offset, int length) throws IOException public long skip(long n) throws IOException public int available( ) throws IOException public void close( ) throws IOException
Concrete subclasses of InputStream
use these methods to read data
from particular media. For instance, a FileInputStream
reads data from a file. A
TelnetInputStream
reads data from a
network connection. A ByteArrayInputStream
reads data from an array
of bytes. But whichever source you’re reading, you mostly use only these
same six methods. Sometimes you don’t know exactly what kind of stream
you’re reading from. For instance, TelnetInputStream
is an undocumented class
hidden inside the sun.net
package.
Instances of it are returned by various methods in the java.net
package: for example, the openStream( )
method of java.net.URL
. However, these methods are
declared to return only InputStream
,
not the more specific subclass TelnetInputStream
. That’s polymorphism at work
once again. The instance of the subclass can be used transparently as an
instance of its superclass. No specific knowledge of the subclass is
required.
The basic method of InputStream
is the noargs read( )
method. This method reads a single byte of data from the
input stream’s source and returns it as an int
from 0 to 255. End of stream is signified
by returning -1. The read( )
method
waits and blocks execution of any code that follows it until a byte of
data is available and ready to be read. Input and output can be slow, so
if your program is doing anything else of importance, try to put I/O in
its own thread.
The read( )
method is declared
abstract because subclasses need to change it to handle their particular
medium. For instance, a ByteArrayInputStream
can implement this method
with pure Java code that copies the byte from its array. However, a
TelnetInputStream
needs to use a
native library that understands how to read data from the network
interface on the host platform.
The following code fragment reads 10 bytes from the InputStream
in
and stores them in the byte
array input
. However, if end of stream is detected,
the loop is terminated early:
byte[] input = new byte[10]; for (int i = 0; i < input.length; i++) { int b = in.read( ); if (b == -1) break; input[i] = (byte) b; }
Although read( )
only reads a
byte, it returns an int
. Thus, a cast
is necessary before storing the result in the byte array. Of course,
this produces a signed byte from -128 to 127 instead of the unsigned
byte from 0 to 255 returned by the read(
)
method. However, as long as you’re clear about which one
you’re working with, this is not a major problem. You can convert a
signed byte to an unsigned byte like this:
int i = b >= 0 ? b : 256 + b;
Reading a byte at a time is as inefficient as writing data one
byte at a time. Consequently, there are two overloaded read()
methods that fill a specified array
with multiple bytes of data read from the stream, read(byte[] input)
and read(byte[]
input
, int
offset
, int
length)
. The first method attempts to fill the
specified array input
. The second
attempts to fill the specified subarray of input
, starting at offset
and continuing for length
bytes.
Notice I said these methods attempt to fill
the array, not that they necessarily succeed. An attempt may fail in
several ways. For instance, it’s not unheard of that while your program
is reading data from a remote web server over a PPP dialup link, a bug
in a switch at a phone company central office will disconnect you and
several thousand of your neighbors from the rest of the world. This
would cause an IOException
. More
commonly, however, a read attempt won’t completely fail but won’t
completely succeed, either. Some of the requested bytes may be read, but
not all of them. For example, you may try to read 1,024 bytes from a
network connection, when only 512 have actually arrived from the server;
the rest are still in transit. They’ll arrive eventually, but they
aren’t available at this moment. To account for this, the multibyte read
methods return the number of bytes actually read. For example, consider
this code fragment:
byte[] input = new byte[1024]; int bytesRead = in.read(input);
It attempts to read 1,024 bytes from the InputStream
in
into the array input
. However, if only 512 bytes are
available, that’s all that will be read, and bytesRead
will be set to 512. To guarantee
that all the bytes you want are actually read, place the read in a loop
that reads repeatedly until the array is filled. For example:
int bytesRead = 0; int bytesToRead = 1024; byte[] input = new byte[bytesToRead]; while (bytesRead < bytesToRead) { bytesRead += in.read(input, bytesRead, bytesToRead - bytesRead); }
This technique is especially crucial for network streams. Chances are that if a file is available
at all, all the bytes of a file are also available. However, since
networks move much more slowly than CPUs, it is very easy for a program
to empty a network buffer before all the data has arrived. In fact, if
one of these two methods tries to read from a temporarily empty but open
network buffer, it will generally return 0, indicating that no data is
available but the stream is not yet closed. This is often preferable to
the behavior of the single-byte read(
)
method, which blocks the running thread in the same
circumstances.
All three read( )
methods
return -1 to signal the end of the stream. If the stream ends while
there’s still data that hasn’t been read, the multibyte read( )
methods return the data until the
buffer has been emptied. The next call to any of the read( )
methods will return -1. The -1 is
never placed in the array. The array only contains actual data. The
previous code fragment had a bug because it didn’t consider the
possibility that all 1,024 bytes might never arrive (as opposed to not
being immediately available). Fixing that bug requires testing the
return value of read( )
before adding
it to bytesRead
. For example:
int bytesRead=0; int bytesToRead=1024; byte[] input = new byte[bytesToRead]; while (bytesRead < bytesToRead) { int result = in.read(input, bytesRead, bytesToRead - bytesRead); if (result == -1) break; bytesRead += result; }
If you do not want to wait until all the bytes you need are
immediately available, you can use the available()
method to determine how many bytes can be read without
blocking. This returns the minimum number of bytes you can read. You may
in fact be able to read more, but you will be able to read at least as
many bytes as available()
suggests.
For example:
int bytesAvailable = in.available( ); byte[] input = new byte[bytesAvailable]; int bytesRead = in.read(input, 0, bytesAvailable); // continue with rest of program immediately...
In this case, you can assert that bytesRead
is exactly equal to bytesAvailable
. You cannot, however, assert
that bytesRead
is greater than zero.
It is possible that no bytes were available. On end of stream, available( )
returns 0. Generally, read(byte[]
input
, int
offset
, int
length)
returns -1 on end of stream; but if
length
is 0, then it does not notice
the end of stream and returns 0 instead.
On rare occasions, you may want to skip over data without reading
it. The skip( )
method accomplishes this task. It’s less useful on
network connections than when reading from files. Network connections
are sequential and normally quite slow, so it’s not significantly more
time-consuming to read data than to skip over it. Files are random
access so that skipping can be implemented simply by repositioning a
file pointer rather than processing each byte to be skipped.
As with output streams, once your program has finished with an
input stream, it should close it by invoking its close()
method. This releases any resources
associated with the stream, such as file handles or ports. Once an input
stream has been closed, further reads from it throw IOException
s. However, some kinds of streams
may still allow you to do things with the object. For instance, you
generally won’t want to get the message digest from a java.security.DigestInputStream
until after
the data has been read and the stream closed.
Marking and Resetting
The InputStream
class
also has three less commonly used methods that allow programs to back
up and reread data they’ve already read. These are:
public void mark(int readAheadLimit) public void reset( ) throws IOException public boolean markSupported( )
In order to reread data, mark the current position in the stream
with the mark( )
method. At a later
point, you can reset the stream to the marked position using the
reset( )
method. Subsequent reads
then return data starting from the marked position. However, you may
not be able to reset as far back as you like. The number of bytes you
can read from the mark and still reset is determined by the readAheadLimit
argument to mark( )
. If you try to reset too far back,
an IOException
is thrown.
Furthermore, there can be only one mark in a stream at any given time.
Marking a second location erases the first mark.
Marking and resetting are usually implemented by storing every
byte read from the marked position on in an internal buffer. However,
not all input streams support this. Before trying to use marking and
resetting, check to see whether the markSupported()
method returns true. If it
does, the stream supports marking and resetting. Otherwise, mark( )
will do nothing and reset( )
will throw an IOException
.
Tip
In my opinion, this demonstrates very poor design. In
practice, more streams don’t support marking
and resetting than do. Attaching functionality
to an abstract superclass that is not available to many, probably
most, subclasses is a very poor idea. It would be better to place
these three methods in a separate interface that could be
implemented by those classes that provided this functionality. The
disadvantage of this approach is that you couldn’t then invoke these
methods on an arbitrary input stream of unknown type, but in
practice, you can’t do that anyway because not all streams support
marking and resetting. Providing a method such as markSupported( )
to check for
functionality at runtime is a more traditional, non-object-oriented
solution to the problem. An object-oriented approach would embed
this in the type system through interfaces and classes so that it
could all be checked at compile time.
The only two input stream classes in java.io
that always support marking are
BufferedInputStream
and ByteArrayInputStream
. However, other input
streams such as TelnetInputStream
may support marking if they’re chained to a buffered input stream
first.
Filter Streams
InputStream
and OutputStream
are
fairly raw classes. They read and write bytes singly or in groups, but
that’s all. Deciding what those bytes mean—whether they’re integers or
IEEE 754 floating point numbers or Unicode text—is completely up to the
programmer and the code. However, there are certain extremely common
data formats that can benefit from a solid implementation in the class
library. For example, many integers passed as parts of network protocols
are 32-bit big-endian integers. Much text sent over the Web is either
7-bit ASCII, 8-bit Latin-1, or multi-byte UTF-8. Many files transferred
by FTP are stored in the zip format. Java provides a number of filter
classes you can attach to raw streams to translate the raw bytes to and
from these and other formats.
The filters come in two versions: the filter streams and the
readers and writers. The filter streams still work primarily with raw
data as bytes: for instance, by compressing the data or interpreting it
as binary numbers. The readers and writers handle the special case of
text in a variety of encodings such as UTF-8 and ISO 8859-1. Filter
streams are placed on top of raw streams such as a TelnetInputStream
or a FileOutputStream
or other filter streams.
Readers and writers can be layered on top of raw streams, filter
streams, or other readers and writers. However, filter streams cannot be
placed on top of a reader or a writer, so we’ll start with filter
streams and address readers and writers in the next section.
Filters are organized in a chain, as shown in Figure 4-2. Each link in the chain
receives data from the previous filter or stream and passes the data
along to the next link in the chain. In this example, a compressed,
encrypted text file arrives from the local network interface, where
native code presents it to the undocumented TelnetInputStream
. A BufferedInputStream
buffers the data to speed
up the entire process. A CipherInputStream
decrypts the data. A
GZIPInputStream
decompresses the
deciphered data. An InputStreamReader
converts the decompressed data to Unicode text. Finally, the text is
read into the application and processed.
Every filter output stream has the same write( )
, close(
)
, and flush( )
methods as
java.io.OutputStream
. Every filter
input stream has the same read( )
,
close( )
, and available( )
methods as java.io.InputStream
. In some cases, such as
BufferedInputStream
and BufferedOutputStream
, these may be the only
methods they have. The filtering is purely internal and does not expose
any new public interface. However, in most cases, the filter stream adds
public methods with additional purposes. Sometimes these are intended to
be used in addition to the usual read()
and write(
)
methods, like the unread(
)
method of PushbackInputStream
. At other times, they
almost completely replace the original interface. For example, it’s
relatively rare to use the write()
method of PrintStream
instead of one
of its print( )
and println( )
methods.
Chaining Filters Together
Filters are connected to streams by their constructors.
For example, the following code fragment buffers input from the file
data.txt. First, a FileInputStream
object fin
is created by passing the name of the
file as an argument to the FileInputStream
constructor. Then, a
BufferedInputStream
object bin
is created by passing fin
as an argument to the BufferedInputStream
constructor:
FileInputStream fin = new FileInputStream("data.txt"); BufferedInputStream bin = new BufferedInputStream(fin);
From this point forward, it’s possible to use the read( )
methods of both fin
and bin
to read data from the file
data.txt. However, intermixing calls to different
streams connected to the same source may violate several implicit
contracts of the filter streams. Most of the time, you should only use
the last filter in the chain to do the actual reading or writing. One
way to write your code so that it’s at least harder to introduce this
sort of bug is to deliberately lose the reference to the underlying
input stream. For example:
InputStream in = new FileInputStream("data.txt"); in = new BufferedInputStream(in);
After these two lines execute, there’s no longer any way to
access the underlying file input stream, so you can’t accidentally
read from it and corrupt the buffer. This example works because it’s
not necessary to distinguish between the methods of InputStream
and those of BufferedInputStream
. BufferedInputStream
is simply used
polymorphically as an instance of InputStream
in the first place. In cases
where it is necessary to use the additional methods of the filter
stream not declared in the superclass, you may be able to construct
one stream directly inside another. For example:
DataOutputStream dout = new DataOutputStream(new BufferedOutputStream( new FileOutputStream("data.txt")));
Although these statements can get a little long, it’s easy to split the statement across several lines, like this:
DataOutputStream dout = new DataOutputStream( new BufferedOutputStream( new FileOutputStream("data.txt") ) );
Connection is permanent. Filters cannot be disconnected from a stream.
There are times when you may need to use the methods of multiple
filters in a chain. For instance, if you’re reading a Unicode text
file, you may want to read the byte order mark in the first three
bytes to determine whether the file is encoded as big-endian UCS-2,
little-endian UCS-2, or UTF-8, and then select the matching Reader
filter for the encoding. Or, if
you’re connecting to a web server, you may want to read the header the
server sends to find the Content-encoding
and then use that content
encoding to pick the right Reader
filter to read the body of the response. Or perhaps you want to send
floating point numbers across a network connection using a DataOutputStream
and then retrieve a
MessageDigest
from the DigestOutputStream
that the DataOutputStream
is chained to. In all these
cases, you need to save and use references to each of the underlying
streams. However, under no circumstances should you ever read from or
write to anything other than the last filter in the chain.
Buffered Streams
The BufferedOutputStream
class stores written data in a buffer (a protected byte
array field named buf
) until the
buffer is full or the stream is flushed. Then it writes the data onto
the underlying output stream all at once. A single write of many bytes
is almost always much faster than many small writes that add up to the
same thing. This is especially true of network connections because
each TCP segment or UDP packet carries a finite amount of overhead,
generally about 40 bytes’ worth. This means that sending 1 kilobyte of
data 1 byte at a time actually requires sending 40 kilobytes over the
wire, whereas sending it all at once only requires sending a little
more than 1K of data. Most network cards and TCP implementations
provide some level of buffering themselves, so the real numbers aren’t
quite this dramatic. Nonetheless, buffering network output is
generally a huge performance win.
The BufferedInputStream
class also has a protected byte array named buf
that serves as a buffer. When one of the
stream’s read( )
methods is called,
it first tries to get the requested data from the buffer. Only when
the buffer runs out of data does the stream read from the underlying
source. At this point, it reads as much data as it can from the source
into the buffer, whether it needs all the data immediately or not.
Data that isn’t used immediately will be available for later
invocations of read()
. When reading
files from a local disk, it’s almost as fast to read several hundred
bytes of data from the underlying stream as it is to read one byte of
data. Therefore, buffering can substantially improve performance. The
gain is less obvious on network connections where the bottleneck is
often the speed at which the network can deliver data rather than the
speed at which the network interface delivers data to the program or
the speed at which the program runs. Nonetheless, buffering input
rarely hurts and will become more important over time as network
speeds increase.
BufferedInputStream
has two
constructors, as does BufferedOutputStream
:
public BufferedInputStream(InputStream in) public BufferedInputStream(InputStream in, int bufferSize) public BufferedOutputStream(OutputStream out) public BufferedOutputStream(OutputStream out, int bufferSize)
The first argument is the underlying stream from which unbuffered data will be read or to which buffered data will be written. The second argument, if present, specifies the number of bytes in the buffer. Otherwise, the buffer size is set to 2,048 bytes for an input stream and 512 bytes for an output stream. The ideal size for a buffer depends on what sort of stream you’re buffering. For network connections, you want something a little larger than the typical packet size. However, this can be hard to predict and varies depending on local network connections and protocols. Faster, higher-bandwidth networks tend to use larger packets, although eight kilobytes is an effective maximum packet size for UDP on most networks today, and TCP segments are often no larger than a kilobyte.
BufferedInputStream
does not
declare any new methods of its own. It only overrides methods from
InputStream
. It does support
marking and resetting.
public int read( ) throws IOException public int read(byte[] input, int offset, int length) throws IOException public long skip(long n) throws IOException public int available( ) throws IOException public void mark(int readLimit) public void reset( ) throws IOException public boolean markSupported( )
The two multibyte read( )
methods attempt to completely fill the specified array or subarray of
data by reading from the underlying input stream as many times as
necessary. They return only when the array or subarray has been
completely filled, the end of stream is reached, or the underlying
stream would block on further reads. Most input streams (including
buffered input streams in Java 1.1 and 1.0) do not behave like this.
They read from the underlying stream or data source only once before
returning.
BufferedOutputStream
also
does not declare any new methods of its own. It overrides three
methods from OutputStream
:
public void write(int b) throws IOException public void write(byte[] data, int offset, int length) throws IOException public void flush( ) throws IOException
You call these methods exactly as you would in any output stream. The difference is that each write places data in the buffer rather than directly on the underlying output stream. Consequently, it is essential to flush the stream when you reach a point at which the data needs to be sent.
PrintStream
The PrintStream
class is the first filter output stream most
programmers encounter because System.out
is a PrintStream
. However, other output streams
can also be chained to print streams, using these two
constructors:
public PrintStream(OutputStream out) public PrintStream(OutputStream out, boolean autoFlush)
By default, print streams should be explicitly flushed. However,
if the autoFlush
argument is
true
, the stream will be flushed
every time a byte array or linefeed is written or a println( )
method is invoked.
As well as the usual write(
)
, flush()
, and close( )
methods, PrintStream
has 9 overloaded print()
methods and 10 overloaded println( )
methods:
public void print(boolean b) public void print(char c) public void print(int i) public void print(long l) public void print(float f) public void print(double d) public void print(char[] text) public void print(String s) public void print(Object o) public void println( ) public void println(boolean b) public void println(char c) public void println(int i) public void println(long l) public void println(float f) public void println(double d) public void println(char[] text) public void println(String s) public void println(Object o)
Each print( )
method converts
its argument to a string in a predictable fashion and writes the
string onto the underlying output stream using the default encoding.
The println( )
methods do the same
thing, but they also append a platform-dependent line separator
character to the end of the line they write. This is a linefeed
(\n
) on Unix (including Mac OS X),
a carriage return (\r
) on Mac OS 9,
and a carriage return/linefeed pair (\r\n
) on Windows.
The first problem is that the output from println( )
is
platform-dependent. Depending on what system runs your code, lines may
sometimes be broken with a linefeed, a carriage return, or a carriage
return/linefeed pair. This doesn’t cause problems when writing to the
console, but it’s a disaster for writing network clients and servers
that must follow a precise protocol. Most network protocols such as
HTTP and Gnutela specify that lines should be terminated with a
carriage return/linefeed pair. Using println(
)
makes it easy to write a program that works on Windows but
fails on Unix and the Mac. While many servers and clients are liberal
in what they accept and can handle incorrect line terminators, there
are occasional exceptions. In particular, in conjunction with the bug
in readLine( )
discussed shortly, a
client running on Mac OS 9 that uses println(
)
may hang both the server and the client. To some extent,
this could be fixed by using only print(
)
and ignoring println()
.
However, PrintStream
has other
problems.
The second problem is that PrintStream
assumes the default encoding of
the platform on which it’s running. However, this encoding may not be
what the server or client expects. For example, a web browser
receiving XML files will expect them to be encoded in UTF-8 or UTF-16
unless the server tells it otherwise. However, a web server that uses
PrintStream
may well send the files
encoded in CP1252 from a U.S.-localized Windows system or SJIS from a
Japanese-localized system, whether the client expects or understands
those encodings or not. PrintStream
doesn’t provide any mechanism for changing the default encoding. This
problem can be patched over by using the related PrintWriter
class instead. But the problems
continue.
The third problem is that PrintStream
eats all exceptions. This makes
PrintStream
suitable for textbook
programs such as HelloWorld, since simple console output can be taught
without burdening students with first learning about exception
handling and all that implies. However, network connections are much
less reliable than the console. Connections routinely fail because of
network congestion, phone company misfeasance, remote systems
crashing, and many other reasons. Network programs must be prepared to
deal with unexpected interruptions in the flow of data. The way to do
this is by handling exceptions. However, PrintStream
catches any exceptions thrown by
the underlying output stream. Notice that the declaration of the
standard five OutputStream
methods
in PrintStream
does not have the
usual throws
IOException
declaration:
public abstract void write(int b) public void write(byte[] data) public void write(byte[] data, int offset, int length) public void flush( ) public void close( )
Instead, PrintStream
relies
on an outdated and inadequate error flag. If the underlying stream
throws an exception, this internal error flag is set. The programmer
is relied upon to check the value of the flag using the checkError()
method:
public boolean checkError( )
If programmers are to do any error checking at all on a PrintStream
, they must explicitly check
every call. Furthermore, once an error has occurred, there is no way
to unset the flag so further errors can be detected. Nor is any
additional information available about the error. In short, the error
notification provided by PrintStream
is wholly inadequate for
unreliable network connections. At the end of this chapter, we’ll
introduce a class that fixes all these shortcomings.
PushbackInputStream
PushbackInputStream
is a subclass of FilterInputStream
that provides a pushback
stack so that a program can “unread” bytes onto the input stream. This
lets programs add data to a running stream. For example, you could
prefix a stream with a header before passing it to another process
that needed that header.
The read( )
and available( )
methods of PushbackInputStream
are invoked exactly as with normal input streams.
However, they first attempt to read from the pushback buffer before
reading from the underlying input stream. What this class adds is
unread()
methods that push bytes into the buffer:
public void unread(int b) throws IOException
This method pushes an unsigned byte given as an int
between 0 and 255 onto the stream.
Integers outside this range are truncated to this range as by a cast
to byte
. Assuming nothing else is
pushed back onto this stream, the next read from the stream will
return that byte. As multiple bytes are pushed onto the stream by
repeated invocations of unread( )
,
they are stored in a stack and returned in a last-in, first-out order.
In essence, the buffer is a stack sitting on top of an input stream.
Only when the stack is empty will the underlying stream be
read.
There are two more unread( )
methods that push a specified array or subarray onto the
stream:
public void unread(byte[] input) throws IOException public void unread(byte[] input, int offset, int length) throws IOException
The arrays are stacked in last-in, first-out order. However, bytes popped from the same array will be returned in the order they appeared in the array. That is, the zeroth component of the array will be read before the first component of the array.
By default, the buffer is only one byte long, and trying to
unread more than one byte throws an IOException
. However, the buffer size can be
changed by passing a second argument to the constructor:
public PushbackInputStream(InputStream in) public PushbackInputStream(InputStream in, int size)
Although PushbackInputStream
and BufferedInputStream
both use
buffers, BufferedInputStream
uses
them for data read from the underlying input stream, while PushbackInputStream
uses them for arbitrary
data, which may or may not have been read from the stream originally.
Furthermore, PushbackInputStream
does not allow marking and resetting. The markSupported()
method of PushbackInputStream
returns false.
Data Streams
The DataInputStream
and DataOutputStream
classes
provide methods for reading and writing Java’s primitive data types
and strings in a binary format. The binary formats used are primarily
intended for exchanging data between two different Java programs
whether through a network connection, a datafile, a pipe, or some
other intermediary. What a data output stream writes, a data input
stream can read. However, it happens that the formats are the same
ones used for most Internet protocols that exchange binary numbers.
For instance, the time protocol uses 32-bit big-endian integers, just
like Java’s int
data type. The
controlled-load network element service uses 32-bit IEEE 754 floating
point numbers, just like Java’s float
data type. (This is probably
correlation rather than causation. Both Java and most network
protocols were designed by Unix programmers, and consequently both
tend to use the formats common to most Unix systems.) However, this
isn’t true for all network protocols, so check the details of any
protocol you use. For instance, the Network Time Protocol (NTP)
represents times as 64-bit unsigned fixed point numbers with the
integer part in the first 32 bits and the fraction part in the last 32
bits. This doesn’t match any primitive data type in any common
programming language, although it is fairly straightforward to work
with—at least as far as is necessary for NTP.
The DataOutputStream
class offers these 11 methods for writing particular
Java data types:
public final void writeBoolean(boolean b) throws IOException public final void writeByte(int b) throws IOException public final void writeShort(int s) throws IOException public final void writeChar(int c) throws IOException public final void writeInt(int i) throws IOException public final void writeLong(long l) throws IOException public final void writeFloat(float f) throws IOException public final void writeDouble(double d) throws IOException public final void writeChars(String s) throws IOException public final void writeBytes(String s) throws IOException public final void writeUTF(String s) throws IOException
All data is written in big-endian format. Integers are written
in two’s complement in the minimum number of bytes possible. Thus, a
byte
is written as one
two’s-complement byte, a short
as
two two’s-complement bytes, an int
as four two’s-complement bytes, and a long
as eight two’s-complement bytes. Floats
and doubles are written in IEEE 754 form in 4 and 8 bytes,
respectively. Booleans are written as a single byte with the value 0
for false and 1 for true. Chars are written as two unsigned
bytes.
The last three methods are a little trickier. The writeChars( )
method simply iterates through
the String
argument, writing each
character in turn as a 2-byte, big-endian Unicode character (a UTF-16
code point, to be absolutely precise). The writeBytes( )
method iterates through the
String
argument but writes only the
least significant byte of each character. Thus, information will be
lost for any string with characters from outside the Latin-1 character
set. This method may be useful on some network protocols that specify
the ASCII encoding, but it should be avoided most of the time.
Neither writeChars( )
nor
writeBytes()
encodes the length of
the string in the output stream. As a result, you can’t really
distinguish between raw characters and characters that make up part of
a string. The writeUTF( )
method
does include the length of the string. It encodes the string itself in
a variant of the UTF-8 encoding of Unicode. Since
this variant is subtly incompatible with most non-Java software, it
should be used only for exchanging data with other Java programs that
use a DataInputStream
to read
strings. For exchanging UTF-8 text with all other software, you should
use an InputStreamReader
with the
appropriate encoding. (There wouldn’t be any confusion if Sun had just
called this method and its partner writeString( )
and readString( )
rather than writeUTF( )
and readUTF( )
.)
Along with these methods for writing binary numbers and strings,
DataOutputStream
of course has the
usual write( )
, flush( )
, and close( )
methods any OutputStream
class has.
DataInputStream
is the complementary class to DataOutputStream
. Every format that DataOutputStream
writes, DataInputStream
can read. In addition,
DataInputStream
has the usual
read( )
, available( )
, skip()
, and close(
)
methods, as well as methods for reading complete arrays of
bytes and lines of text.
There are 9 methods to read binary data that match the 11
methods in DataOutputStream
(there’s no exact complement for writeBytes(
)
or writeChars( )
; these
are handled by reading the bytes and chars one at a time):
public final boolean readBoolean( ) throws IOException public final byte readByte( ) throws IOException public final char readChar( ) throws IOException public final short readShort( ) throws IOException public final int readInt( ) throws IOException public final long readLong( ) throws IOException public final float readFloat( ) throws IOException public final double readDouble( ) throws IOException public final String readUTF( ) throws IOException
In addition, DataInputStream
provides two methods to read unsigned bytes and unsigned shorts and
return the equivalent int
. Java
doesn’t have either of these data types, but you may encounter them
when reading binary data written by a C program:
public final int readUnsignedByte( ) throws IOException public final int readUnsignedShort( ) throws IOException
DataInputStream
has the usual
two multibyte read( )
methods that
read data into an array or subarray and return the number of bytes
read. It also has two readFully( )
methods that repeatedly read data from the underlying input stream
into an array until the requested number of bytes have been read. If
enough data cannot be read, an IOException
is thrown. These methods are
especially useful when you know in advance exactly how many bytes you
have to read. This might be the case when you’ve read the Content-length
field out of an HTTP header
and thus know how many bytes of data there are:
public final int read(byte[] input) throws IOException public final int read(byte[] input, int offset, int length) throws IOException public final void readFully(byte[] input) throws IOException public final void readFully(byte[] input, int offset, int length) throws IOException
Finally, DataInputStream
provides the popular readLine()
method that reads a line of text
as delimited by a line terminator and returns a string:
public final String readLine( ) throws IOException
However, this method should not be used under any circumstances,
both because it is deprecated and because it is buggy. It’s deprecated
because it doesn’t properly convert non-ASCII characters to bytes in
most circumstances. That task is now handled by the readLine( )
method of the BufferedReader
class. However, that method
and this one share the same insidious bug: they do not always
recognize a single carriage return as ending a line. Rather, readLine( )
recognizes only a linefeed or a
carriage return/linefeed pair. When a carriage return is detected in
the stream, readLine( )
waits to
see whether the next character is a linefeed before continuing. If it
is a linefeed, the carriage return and the linefeed are thrown away
and the line is returned as a String
. If it isn’t a linefeed, the carriage
return is thrown away, the line is returned as a String
, and the extra character that was
read becomes part of the next line. However, if the carriage return is
the last character in the stream (a very likely occurrence if the
stream originates from a Macintosh or a file created on a Macintosh),
then readLine( )
hangs, waiting for
the last character, which isn’t forthcoming.
This problem isn’t obvious when reading files because there will
almost certainly be a next character: -1 for end of stream, if nothing
else. However, on persistent network connections such as those used
for FTP and late-model HTTP, a server or client may simply stop
sending data after the last character and wait for a response without
actually closing the connection. If you’re lucky, the connection may
eventually time out on one end or the other and you’ll get an IOException
, although this will probably
take at least a couple of minutes. If you’re not lucky, the program
will hang indefinitely.
Note that it is not enough for your program to merely be running
on Windows or Unix to avoid this bug. It must also ensure that it does
not send or receive text files created on a Macintosh and that it
never talks to Macintosh clients or servers. These are very strong
conditions in the heterogeneous world of the Internet. It’s much
simpler to avoid readLine()
completely.
Compressing Streams
The java.util.zip
package contains filter streams that compress and decompress streams
in zip, gzip, and deflate formats. Along with its better-known uses
with files, this package allows Java applications to easily exchange
compressed data across the network. HTTP 1.1 includes support for
compressed file transfer in which the server compresses and the
browser decompresses files, in effect trading increasingly cheap CPU
power for still-expensive network bandwidth. This process is
completely transparent to the user. Of course, it’s not transparent to
the programmer who has to write the compression and decompression
code. However, the java.util.zip
filter streams make it a lot more transparent than it otherwise would
be.
There are six stream classes that perform compression and decompression; the input streams decompress data and the output streams compress it:
public class DeflaterOutputStream extends FilterOutputStream public class InflaterInputStream extends FilterInputStream public class GZIPOutputStream extends FilterOutputStream public class GZIPInputStream extends FilterInputStream public class ZipOutputStream extends FilterOutputStream public class ZipInputStream extends FilterInputStream
All of these classes use essentially the same compression algorithm. They differ only in various constants and meta-information included with the compressed data. In addition, a zip stream may contain more than one compressed file.
Compressing and decompressing data with these classes is almost
trivially easy. You simply chain the filter to the underlying stream
and read or write it like normal. For example, suppose you want to
read the compressed file allnames.gz. Simply open
a FileInputStream
to the file and
chain a GZIPInputStream
to it, like
this:
FileInputStream fin = new FileInputStream("allnames.gz"); GZIPInputStream gzin = new GZIPInputStream(fin);
From this point forward, you can read uncompressed data from
gzin
using the usual read( )
, skip(
)
, and available( )
methods. For instance, this code fragment reads and decompresses a
file named allnames.gz in the current working
directory:
FileInputStream fin = new FileInputStream("allnames.gz"); GZIPInputStream gzin = new GZIPInputStream(fin); FileOutputStream fout = new FileOutputStream("allnames"); int b = 0; while ((b = gzin.read( )) != -1) fout.write(b); gzin.close( ); out.flush( ); out.close( );
In fact, it isn’t even necessary to know that gzin
is a GZIPInputStream
for this to work. A simple
InputStream
type works equally
well. For example:
InputStream in = new GZIPInputStream(new FileInputStream("allnames.gz"));
DeflaterOutputStream
and
InflaterInputStream
are equally
straightforward. ZipInputStream
and
ZipOutputStream
are a little more
complicated because a zip file is actually an archive that may contain
multiple entries, each of which must be read separately. Each file in
a zip archive is represented as a ZipEntry
object whose getName( )
method returns the original name
of the file. For example, this code fragment decompresses the archive
shareware.zip in the current working
directory:
FileInputStream fin = new FileInputStream("shareware.zip"); ZipInputStream zin = new ZipInputStream(fin); ZipEntry ze = null; int b = 0; while ((ze = zin.getNextEntry( )) != null) { FileOutputStream fout = new FileOutputStream(ze.getName( )); while ((b = zin.read( )) != -1) fout.write(b); zin.closeEntry( ); fout.flush( ); fout.close( ); } zin.close( );
Digest Streams
The java.util.security
package contains two
filter streams that can calculate a message digest for a stream. They
are DigestInputStream
and DigestOutputStream
. A message digest,
represented in Java by the java.util.security.MessageDigest
class, is a
strong hash code for the stream; that is, it is a large integer
(typically 20 bytes long in binary format) that can easily be
calculated from a stream of any length in such a fashion that no
information about the stream is available from the message digest.
Message digests can be used for digital signatures and for detecting
data that has been corrupted in transit across the network.
In practice, the use of message digests in digital signatures is
more important. Mere data corruption can be detected with much
simpler, less computationally expensive algorithms. However, the
digest filter streams are so easy to use that at times it may be worth
paying the computational price for the corresponding increase in
programmer productivity. To calculate a digest for an output stream,
you first construct a MessageDigest
object that uses a particular algorithm, such as the Secure Hash
Algorithm (SHA). Pass both the MessageDigest
object and the stream you want
to digest to the DigestOutputStream
constructor. This chains the digest stream to the underlying output
stream. Then write data onto the stream as normal, flush it, close it,
and invoke the getMessageDigest()
method to retrieve the MessageDigest
object. Finally, invoke the
digest( )
method on the MessageDigest
object to finish calculating
the actual digest. Here’s an example:
MessageDigest sha = MessageDigest.getInstance("SHA"); DigestOutputStream dout = new DigestOutputStream(out, sha); byte[] buffer = new byte[128]; while (true) { int bytesRead = in.read(buffer); if (bytesRead < 0) break; dout.write(buffer, 0, bytesRead); } dout.flush( ); dout.close( ); byte[] result = dout.getMessageDigest( ).digest( );
Calculating the digest of an input stream you read is equally
simple. It still isn’t quite as transparent as some of the other
filter streams because you do need to be at least marginally
conversant with the methods of the MessageDigest
class. Nonetheless, it’s still
far easier than writing your own secure hash function and manually
feeding it each byte you write.
Of course, you also need a way of associating a particular
message digest with a particular stream. In some circumstances, the
digest may be sent over the same channel used to send the digested
data. The sender calculates the digest as it sends data, while the
receiver calculates the digest as it receives the data. When the
sender is done, it sends a signal that the receiver recognizes as
indicating the end of the stream and then sends the digest. The
receiver receives the digest, checks that the digest received is the
same as the one calculated locally, and closes the connection. If the
digests don’t match, the receiver may instead ask the sender to send
the message again. Alternatively, both the digest and the files it
digests may be stored in the same zip archive. And there are many
other possibilities. Situations like this generally call for the
design of a relatively formal custom protocol. However, while the
protocol may be complicated, the calculation of the digest is
straightforward, thanks to the DigestInputStream
and DigestOutputStream
filter classes.
Encrypting Streams
The CipherInputStream
and CipherOutputStream
classes in
the javax.crypto
package provide
encryption and decryption services. They are both powered by a
Cipher
engine object that
encapsulates the algorithm used to perform encryption and decryption.
By changing the Cipher
engine
object, you change the algorithm that the streams use to encrypt and
decrypt. Most ciphers also require a key
to encrypt and decrypt the data. Symmetric or secret
key ciphers use the same key for both encryption and decryption.
Asymmetric or public key ciphers use different keys for encryption and
decryption. The encryption key can be distributed as long as the
decryption key is kept secret. Keys are specific to the algorithm and
are represented in Java by instances of the java.security.Key
interface. The Cipher
object is set in the constructor.
Like all filter stream constructors, these constructors also take
another input stream as an argument:
public CipherInputStream(InputStream in, Cipher c) public CipherOutputStream(OutputStream out, Cipher c)
Tip
For legal reasons CipherInputStream
and CipherOutputStream
are not bundled with
the core API in Java 1.3 and earlier. Instead, they are part of a
standard extension to Java called the Java Cryptography Extension, JCE for short. This is in
the javax.crypto
package. Sun
provides an implementation of this API (available from http://java.sun.com/products/jce/) and various third
parties have written independent implementations. Of particular note
is the Legion of the Bouncy Castle’s open source
implementation, which can be downloaded from http://www.bouncycastle.org/.
To get a properly initialized Cipher
object, use the static Cipher.getInstance( )
factory method. This
Cipher
object must be initialized
for either encryption or decryption with init( )
before being passed into one of the
previous constructors. For example, this code fragment prepares a
CipherInputStream
for decryption
using the password “two and not a fnord” and the Data Encryption
Standard (DES) algorithm:
byte[] desKeyData = "two and not a fnord".getBytes( ); DESKeySpec desKeySpec = new DESKeySpec(desKeyData); SecretKeyFactory keyFactory = SecretKeyFactory.getInstance("DES"); SecretKey desKey = keyFactory.generateSecret(desKeySpec); Cipher des = Cipher.getInstance("DES"); des.init(Cipher.DECRYPT_MODE, desKey); CipherInputStream cin = new CipherInputStream(fin, des);
This fragment uses classes from the java.security
, java.security.spec
, javax.crypto
, and javax.crypto.spec
packages. Different
implementations of the JCE support different groups of encryption
algorithms. Common algorithms include DES, RSA, and Blowfish. The
construction of a key is generally algorithm-specific. Consult the
documentation for your JCE implementation for more details.
CipherInputStream
overrides
most of the normal InputStream
methods like read( )
and available( )
. CipherOutputStream
overrides most of the
usual OutputStream
methods like
write()
and flush( )
. These methods are all invoked much
as they would be for any other stream. However, as the data is read or
written, the stream’s Cipher
object
either decrypts or encrypts the data. (Assuming your program wants to
work with unencrypted data—as is commonly the case—a cipher input
stream will decrypt the data and a cipher output stream will encrypt
the data.) For example, this code fragment encrypts the file
secrets.txt using the password “Mary had a little
spider”:
String infile = "secrets.txt"; String outfile = "secrets.des"; String password = "Mary had a little spider"; try { FileInputStream fin = new FileInputStream(infile); FileOutputStream fout = new FileOutputStream(outfile); // register the provider that implements the algorithm Provider sunJce = new com.sun.crypto.provider.SunJCE( ); Security.addProvider(sunJce); // create a key char[] pbeKeyData = password.toCharArray( ); PBEKeySpec pbeKeySpec = new PBEKeySpec(pbeKeyData); SecretKeyFactory keyFactory = SecretKeyFactory.getInstance("PBEWithMD5AndDES"); SecretKey pbeKey = keyFactory.generateSecret(pbeKeySpec); // use Data Encryption Standard Cipher pbe = Cipher.getInstance("PBEWithMD5AndDES"); pbe.init(Cipher.ENCRYPT_MODE, pbeKey); CipherOutputStream cout = new CipherOutputStream(fout, pbe); byte[] input = new byte[64]; while (true) { int bytesRead = fin.read(input); if (bytesRead == -1) break; cout.write(input, 0, bytesRead); } cout.flush( ); cout.close( ); fin.close( ); } catch (Exception ex) { System.err.println(ex); }
I admit that this is more complicated than it needs to be.
There’s a lot of setup work involved in creating the Cipher
object that actually performs the
encryption. Partly, that’s because key generation involves quite a bit
more than a simple password. However, a large part of the complication
is due to inane U.S. export laws that prevent Sun from fully
integrating the JCE with the JDK and JRE. To a large extent, the
complex architecture used here is driven by a need to separate the
actual encrypting and decrypting code from the cipher stream classes.
Readers and Writers
Many programmers have a bad habit of writing code as if all text were ASCII or at least in the native encoding of the platform. While some older, simpler network protocols, such as daytime, quote of the day, and chargen, do specify ASCII encoding for text, this is not true of HTTP and many other more modern protocols, which allow a wide variety of localized encodings, such as K0I8-R Cyrillic, Big-5 Chinese, and ISO 8859-2 for most Central European languages. Java’s native character set is the UTF-16 encoding of Unicode. When the encoding is no longer ASCII, the assumption that bytes and chars are essentially the same things also breaks down. Consequently, Java provides an almost complete mirror of the input and output stream class hierarchy designed for working with characters instead of bytes.
In this mirror image hierarchy, two abstract superclasses define
the basic API for reading and writing characters. The java.io.Reader
class specifies the API by
which characters are read. The java.io.Writer
class specifies the API by
which characters are written. Wherever input and output streams use
bytes, readers and writers use Unicode characters. Concrete subclasses
of Reader
and Writer
allow particular sources to be read and
targets to be written. Filter readers and writers can be attached to
other readers and writers to provide additional services or
interfaces.
The most important concrete subclasses of Reader
and Writer
are the InputStreamReader
and the OutputStreamWriter
classes. An InputStreamReader
contains an underlying input
stream from which it reads raw bytes. It translates these bytes into
Unicode characters according to a specified encoding. An OutputStreamWriter
receives Unicode characters
from a running program. It then translates those characters into bytes
using a specified encoding and writes the bytes onto an underlying
output stream.
In addition to these two classes, the java.io
package provides several raw reader
and writer classes that read characters without directly requiring an
underlying input stream, including:
FileReader
FileWriter
StringReader
StringWriter
CharArrayReader
CharArrayWriter
The first two classes in this list work with files and the last four work inside Java, so they aren’t of great use for network programming. However, aside from different constructors, these classes have pretty much the same public interface as all other reader and writer classes.
Writers
The Writer
class
mirrors the java.io.OutputStream
class. It’s abstract and has two protected constructors. Like OutputStream
, the Writer
class is never used directly;
instead, it is used polymorphically, through one of its subclasses. It
has five write()
methods as well as
a flush( )
and a close( )
method:
protected Writer( ) protected Writer(Object lock) public abstract void write(char[] text, int offset, int length) throws IOException public void write(int c) throws IOException public void write(char[] text) throws IOException public void write(String s) throws IOException public void write(String s, int offset, int length) throws IOException public abstract void flush( ) throws IOException public abstract void close( ) throws IOException
The write(char[]
text
, int
offset
, int
length)
method is the base method in terms
of which the other four write( )
methods are implemented. A subclass must override at least this method
as well as flush( )
and close()
, although most override some of the
other write( )
methods as well in
order to provide more efficient implementations. For example, given a
Writer
object w
, you can write the string “Network” like
this:
char[] network = {'N', 'e', 't', 'w', 'o', 'r', 'k'}; w.write(network, 0, network.length);
The same task can be accomplished with these other methods, as well:
w.write(network); for (int i = 0; i < network.length; i++) w.write(network[i]); w.write("Network"); w.write("Network", 0, 7);
All of these examples are different ways of expressing the same
thing. Which you use in any given situation is mostly a matter of
convenience and taste. However, how many and which bytes are written
by these lines depends on the encoding w
uses. If it’s using big-endian UTF-16, it
will write these 14 bytes (shown here in hexadecimal) in this
order:
00 4E 00 65 00 74 00 77 00 6F 00 72 00 6B
On the other hand, if w
uses
little-endian UTF-16, this sequence of 14 bytes is written:
4E 00 65 00 74 00 77 00 6F 00 72 00 6B 00
If w
uses Latin-1, UTF-8, or
MacRoman, this sequence of seven bytes is written:
4E 65 74 77 6F 72 6B
Other encodings may write still different sequences of bytes. The exact output depends on the encoding.
Writers may be buffered, either directly by being chained to a
BufferedWriter
or indirectly
because their underlying output stream is buffered. To force a write
to be committed to the output medium, invoke the flush()
method:
w.flush( );
The close( )
method behaves
similarly to the close( )
method of
OutputStream
. close( )
flushes the writer, then closes the
underlying output stream and releases any resources associated with
it:
public abstract void close( ) throws IOException
After a writer has been closed, further writes throw IOException
s.
OutputStreamWriter
OutputStreamWriter
is the most important concrete subclass of Writer
. An OutputStreamWriter
receives characters from
a Java program. It converts these into bytes according to a specified
encoding and writes them onto an underlying output stream. Its
constructor specifies the output stream to write to and the encoding
to use:
public OutputStreamWriter(OutputStream out, String encoding) throws UnsupportedEncodingException public OutputStreamWriter(OutputStream out)
Valid encodings are listed in the documentation for Sun’s native2ascii tool included with the JDK and available from http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html. If no encoding is specified, the default encoding for the platform is used. (In the United States, the default encoding is ISO Latin-1 on Solaris and Windows, MacRoman on the Mac.) For example, this code fragment writes the string
in the Cp1253 Windows Greek encoding:
OutputStreamWriter w = new OutputStreamWriter( new FileOutputStream("OdysseyB.txt"), "Cp1253"); w.write("");
Other than the constructors, OutputStreamWriter
has only the usual
Writer
methods (which are used
exactly as they are for any Writer
class) and one method to return the encoding of the object:
public String getEncoding( )
Readers
The Reader
class
mirrors the java.io.InputStream
class. It’s abstract with two protected constructors. Like InputStream
and Writer
, the Reader
class is never used directly, only
through one of its subclasses. It has three read()
methods, as
well as skip( )
, close( )
, ready(
)
, mark( )
, reset( )
, and markSupported( )
methods:
protected Reader( ) protected Reader(Object lock) public abstract int read(char[] text, int offset, int length) throws IOException public int read( ) throws IOException public int read(char[] text) throws IOException public long skip(long n) throws IOException public boolean ready( ) public boolean markSupported( ) public void mark(int readAheadLimit) throws IOException public void reset( ) throws IOException public abstract void close( ) throws IOException
The read(char[]
text
, int
offset
, int
length)
method is the fundamental method
through which the other two read( )
methods are implemented. A subclass must override at least this method
as well as close( )
, although most
will override some of the other read(
)
methods as well in order to provide more efficient
implementations.
Most of these methods are easily understood by analogy with
their InputStream
counterparts. The
read()
method returns a single
Unicode character as an int
with a
value from 0 to 65,535 or -1 on end of stream. The read(char[]
text)
method tries to fill the array
text
with characters and returns
the actual number of characters read or -1 on end of stream. The
read(char[]
text
, int
offset
, int
length)
method attempts to read length
characters into the subarray of
text
beginning at offset
and continuing for length
characters. It also returns the
actual number of characters read or -1 on end of stream. The skip(long
n)
method skips n
characters. The mark( )
and reset(
)
methods allow some readers to reset back to a marked
position in the character sequence. The markSupported( )
method tells you whether
the reader supports marking and resetting. The close( )
method closes the reader and any
underlying input stream so that further attempts to read from it throw
IOException
s.
The exception to the rule of similarity is ready()
, which has the same general purpose
as available( )
but not quite the
same semantics, even modulo the byte-to-char conversion. Whereas
available( )
returns an int
specifying a minimum number of bytes
that may be read without blocking, ready(
)
only returns a boolean
indicating whether the reader may be read without blocking. The
problem is that some character encodings, such as UTF-8, use different
numbers of bytes for different characters. Thus, it’s hard to tell how
many characters are waiting in the network or filesystem buffer
without actually reading them out of the buffer.
InputStreamReader
is the most
important concrete subclass of Reader
. An InputStreamReader
reads bytes from an
underlying input stream such as a FileInputStream
or TelnetInputStream
. It converts these into
characters according to a specified encoding and returns them. The
constructor specifies the input stream to read from and the encoding
to use:
public InputStreamReader(InputStream in) public InputStreamReader(InputStream in, String encoding) throws UnsupportedEncodingException
If no encoding is specified, the default encoding for the
platform is used. If an unknown encoding is specified, then an
UnsupportedEncodingException
is
thrown.
For example, this method reads an input stream and converts it all to one Unicode string using the MacCyrillic encoding:
public static String getMacCyrillicString(InputStream in) throws IOException { InputStreamReader r = new InputStreamReader(in, "MacCyrillic"); StringBuffer sb = new StringBuffer( ); int c; while ((c = r.read( )) != -1) sb.append((char) c); r.close( ); return sb.toString( ); }
Filter Readers and Writers
The InputStreamReader
and OutputStreamWriter
classes act
as decorators on top of input and output streams that change the
interface from a byte-oriented interface to a character-oriented
interface. Once this is done, additional character-oriented filters
can be layered on top of the reader or writer using the java.io.FilterReader
and java.io.FilterWriter
classes. As with filter
streams, there are a variety of subclasses that perform specific
filtering, including:
BufferedReader
BufferedWriter
LineNumberReader
PushbackReader
PrintWriter
Buffered readers and writers
The BufferedReader
and BufferedWriter
classes are the character-based equivalents of the byte-oriented
BufferedInputStream
and BufferedOutputStream
classes. Where
BufferedInputStream
and BufferedOutputStream
use an internal array
of bytes as a buffer, BufferedReader
and BufferedWriter
use an internal array of
chars.
When a program reads from a BufferedReader
, text is taken from the
buffer rather than directly from the underlying input stream or
other text source. When the buffer empties, it is filled again with
as much text as possible, even if not all of it is immediately
needed, making future reads much faster. When a program writes to a
BufferedWriter
, the text is
placed in the buffer. The text is moved to the underlying output
stream or other target only when the buffer fills up or when the
writer is explicitly flushed, which can make writes much faster than
would otherwise be the case.
BufferedReader
and BufferedWriter
have the usual methods
associated with readers and writers, like read( )
, ready(
)
, write( )
, and
close( )
. They each have two
constructors that chain the BufferedReader
or BufferedWriter
to an underlying reader or
writer and set the size of the buffer. If the size is not set, the
default size of 8,192 characters is used:
public BufferedReader(Reader in, int bufferSize) public BufferedReader(Reader in) public BufferedWriter(Writer out) public BufferedWriter(Writer out, int bufferSize)
For example, the earlier getMacCyrillicString( )
example was less
than efficient because it read characters one at a time. Since
MacCyrillic is a 1-byte character set, it also read bytes one at a
time. However, it’s straightforward to make it run faster by
chaining a BufferedReader
to the
InputStreamReader
, like
this:
public static String getMacCyrillicString(InputStream in) throws IOException { Reader r = new InputStreamReader(in, "MacCyrillic"); r = new BufferedReader(r, 1024); StringBuffer sb = new StringBuffer( ); int c; while ((c = r.read( )) != -1) sb.append((char) c); r.close( ); return sb.toString( ); }
All that was needed to buffer this method was one additional
line of code. None of the rest of the algorithm had to change, since
the only InputStreamReader
methods used were the read( )
and
close( )
methods declared in the
Reader
superclass and shared by
all Reader
subclasses, including
BufferedReader
.
The BufferedReader
class
also has a readLine( )
method
that reads a single line of text and returns it as a string:
public String readLine( ) throws IOException
This method is supposed to replace the deprecated readLine()
method in DataInputStream
, and it has mostly the
same behavior as that method. The big difference is that by chaining
a BufferedReader
to an InputStreamReader
, you can correctly read
lines in character sets other than the default encoding for the
platform. Unfortunately, this method shares the same bugs as the
readLine( )
method in DataInputStream
, discussed earlier in this
chapter. That is, readline( )
tends to hang its thread when reading streams where lines end in
carriage returns, as is commonly the case when the streams derive
from a Macintosh or a Macintosh text file. Consequently, you should
scrupulously avoid this method in network programs.
It’s not all that difficult, however, to write a safe version
of this class that correctly implements the readLine( )
method. Example 4-1 is such a SafeBufferedReader
class. It has exactly the same public interface as
BufferedReader
; it just has a
slightly different private implementation. I’ll use this class in
future chapters in situations where it’s extremely convenient to
have a readLine( )
method.
package com.macfaq.io; import java.io.*; public class SafeBufferedReader extends BufferedReader { public SafeBufferedReader(Reader in) { this(in, 1024); } public SafeBufferedReader(Reader in, int bufferSize) { super(in, bufferSize); } private boolean lookingForLineFeed = false; public String readLine( ) throws IOException { StringBuffer sb = new StringBuffer(""); while (true) { int c = this.read( ); if (c == -1) { // end of stream if (sb.length() == 0) return null; return sb.toString( ); } else if (c == '\n') { if (lookingForLineFeed) { lookingForLineFeed = false; continue; } else { return sb.toString( ); } } else if (c == '\r') { lookingForLineFeed = true; return sb.toString( ); } else { lookingForLineFeed = false; sb.append((char) c); } } } }
The BufferedWriter( )
class
adds one new method not included in its superclass, called newLine( )
, also geared toward writing
lines:
public void newLine( ) throws IOException
This method inserts a platform-dependent line-separator string
into the output. The line.separator
system property determines
exactly what the string is: probably a linefeed on Unix and Mac OS
X, a carriage return on Mac OS 9, and a carriage return/linefeed
pair on Windows. Since network protocols generally specify the
required line-terminator, you should not use this method for network
programming. Instead, explicitly write the line-terminator the
protocol requires.
LineNumberReader
LineNumberReader
is a subclass of BufferedReader
that keeps track of the
current line number. This can be retrieved at any time with the
getLineNumber( )
method:
public int getLineNumber( )
By default, the first line number is 0. However, the number of
the current line and all subsequent lines can be changed with the
setLineNumber( )
method:
public void setLineNumber(int lineNumber)
This method adjusts only the line numbers that getLineNumber( )
reports. It does not
change the point at which the stream is read.
The LineNumberReader
’s
readLine( )
method shares the
same bug as BufferedReader
and
DataInputStream
’s, and is not
suitable for network programming. However, the line numbers are also
tracked if you use only the regular read(
)
methods, and these do not share that bug. Besides these
methods and the usual Reader
methods, LineNumberReader
has
only these two constructors:
public LineNumberReader(Reader in) public LineNumberReader(Reader in, int bufferSize)
Since LineNumberReader
is a
subclass of BufferedReader
, it
has an internal character buffer whose size can be set with the
second constructor. The default size is 8,192 characters.
PushbackReader
The PushbackReader
class is the mirror image of the PushbackInputStream
class. As usual, the
main difference is that it pushes back chars rather than bytes. It
provides three unread( )
methods
that push characters onto the reader’s input buffer:
public void unread(int c) throws IOException public void unread(char[] text) throws IOException public void unread(char[] text, int offset, int length) throws IOException
The first unread( )
method
pushes a single character onto the reader. The second pushes an
array of characters. The third pushes the specified subarray of
characters, starting with text[offset]
and continuing through
text[offset+length-1]
.
By default, the size of the pushback buffer is only one character. However, the size can be adjusted in the second constructor:
public PushbackReader(Reader in) public PushbackReader(Reader in, int bufferSize)
Trying to unread more characters than the buffer will hold
throws an IOException
.
PrintWriter
The PrintWriter
class is a replacement for Java 1.0’s PrintStream
class that properly handles
multibyte character sets and international text. Sun originally
planned to deprecate PrintStream
in favor of PrintWriter
but
backed off when it realized this step would invalidate too much
existing code, especially code that depended on System.out
. Nonetheless, new code should
use PrintWriter
instead of
PrintStream
.
Aside from the constructors, the PrintWriter
class has an almost identical
collection of methods to PrintStream
. These include:
public PrintWriter(Writer out) public PrintWriter(Writer out, boolean autoFlush) public PrintWriter(OutputStream out) public PrintWriter(OutputStream out, boolean autoFlush) public void flush( ) public void close( ) public boolean checkError( ) protected void setError( ) public void write(int c) public void write(char[] text, int offset, int length) public void write(char[] text) public void write(String s, int offset, int length) public void write(String s) public void print(boolean b) public void print(char c) public void print(int i) public void print(long l) public void print(float f) public void print(double d) public void print(char[] text) public void print(String s) public void print(Object o) public void println( ) public void println(boolean b) public void println(char c) public void println(int i) public void println(long l) public void println(float f) public void println(double d) public void println(char[] text) public void println(String s) public void println(Object o)
Most of these methods behave the same for PrintWriter
as they do for PrintStream
. The exceptions are the four
write( )
methods, which write
characters rather than bytes; also, if the underlying writer
properly handles character set conversion, so do all the methods of
the PrintWriter
. This is an
improvement over the noninternationalizable PrintStream
class, but it’s still not good
enough for network programming. PrintWriter
still has the problems of
platform dependency and minimal error reporting that plague PrintStream
.
It isn’t hard to write a PrintWriter
class that does work for
network programming. You simply have to require the programmer to
specify a line separator and let the IOException
s fall where they may. Example 4-2 demonstrates. Notice
that all the constructors require an explicit line-separator string
to be provided.
/* * @(#)SafePrintWriter.java 1.0 04/06/28 * * Placed in the public domain * No rights reserved. */ package com.macfaq.io; import java.io.*; /** * @version 1.1, 2004-06-28 * @author Elliotte Rusty Harold * @since Java Network Programming, 2nd edition */ public class SafePrintWriter extends Writer { protected Writer out; private boolean autoFlush = false; private String lineSeparator; private boolean closed = false; public SafePrintWriter(Writer out, String lineSeparator) { this(out, false, lineSeparator); } public SafePrintWriter(Writer out, char lineSeparator) { this(out, false, String.valueOf(lineSeparator)); } public SafePrintWriter(Writer out, boolean autoFlush, String lineSeparator) { super(out); this.out = out; this.autoFlush = autoFlush; if (lineSeparator == null) { throw new NullPointerException("Null line separator"); } this.lineSeparator = lineSeparator; } public SafePrintWriter(OutputStream out, boolean autoFlush, String encoding, String lineSeparator) throws UnsupportedEncodingException { this(new OutputStreamWriter(out, encoding), autoFlush, lineSeparator); } public void flush( ) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.flush( ); } } public void close( ) throws IOException { try { this.flush( ); } catch (IOException ex) { } synchronized (lock) { out.close( ); this.closed = true; } } public void write(int c) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(c); } } public void write(char[] text, int offset, int length) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(text, offset, length); } } public void write(char[] text) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(text, 0, text.length); } } public void write(String s, int offset, int length) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(s, offset, length); } } public void print(boolean b) throws IOException { if (b) this.write("true"); else this.write("false"); } public void println(boolean b) throws IOException { if (b) this.write("true"); else this.write("false"); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(char c) throws IOException { this.write(String.valueOf(c)); } public void println(char c) throws IOException { this.write(String.valueOf(c)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(int i) throws IOException { this.write(String.valueOf(i)); } public void println(int i) throws IOException { this.write(String.valueOf(i)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(long l) throws IOException { this.write(String.valueOf(l)); } public void println(long l) throws IOException { this.write(String.valueOf(l)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(float f) throws IOException { this.write(String.valueOf(f)); } public void println(float f) throws IOException { this.write(String.valueOf(f)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(double d) throws IOException { this.write(String.valueOf(d)); } public void println(double d) throws IOException { this.write(String.valueOf(d)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(char[] text) throws IOException { this.write(text); } public void println(char[] text) throws IOException { this.write(text); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(String s) throws IOException { if (s == null) this.write("null"); else this.write(s); } public void println(String s) throws IOException { if (s == null) this.write("null"); else this.write(s); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(Object o) throws IOException { if (o == null) this.write("null"); else this.write(o.toString( )); } public void println(Object o) throws IOException { if (o == null) this.write("null"); else this.write(o.toString( )); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void println( ) throws IOException { this.write(lineSeparator); if (autoFlush) out.flush( ); } }
This class actually extends Writer
rather than FilterWriter
, unlike PrintWriter
. It could extend FilterWriter
instead; however, this would
save only one field and one line of code, since this class needs to
override every single method in FilterWriter
(close( )
, flush(
)
, and all three write(
)
methods). The reason for this is twofold. First, the
PrintWriter
class has to be much
more careful about synchronization than the FilterWriter
class. Second, some of the
classes that may be used as an underlying Writer
for this class, notably CharArrayWriter
, do not implement the
proper semantics for close( )
and
allow further writes to take place even after the writer is closed.
Consequently, programmers have to handle the checks for whether the
stream is closed in this class rather than relying on the underlying
Writer
out
to do it for them.
Get Java Network Programming, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.