BUY THIS BOOK
Add to Cart

Print Book $34.95


Add to Cart

Print+PDF $45.44

Add to Cart

PDF $27.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £24.95

What is this?

Looking to Reprint or License this content?


Java NIO
Java NIO

By Ron Hitchens
Book Price: $34.95 USD
£24.95 GBP
PDF Price: $27.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
Get the facts first. You can distort them later.
—Mark Twain
Let's talk about I/O. No, no, come back. It's not really all that dull. Input/output (I/O) is not a glamorous topic, but it's a very important one. Most programmers think of I/O in the same way they do about plumbing: undoubtedly essential, can't live without it, but it can be unpleasant to deal with directly and may cause a big, stinky mess when not working properly. This is not a book about plumbing, but in the pages that follow, you may learn how to make your data flow a little more smoothly.
Object-oriented program design is all about encapsulation. Encapsulation is a good thing: it partitions responsibility, hides implementation details, and promotes object reuse. This partitioning and encapsulation tends to apply to programmers as well as programs. You may be a highly skilled Java programmer, creating extremely sophisticated objects and doing extraordinary things, and yet be almost entirely ignorant of some basic concepts underpinning I/O on the Java platform. In this chapter, we'll momentarily violate your encapsulation and take a look at some low-level I/O implementation details in the hope that you can better orchestrate the multiple moving parts involved in any I/O operation.
Most programmers fancy themselves software artists, crafting clever routines to squeeze a few bytes here, unrolling a loop there, or refactoring somewhere else to consolidate objects. While those things are undoubtedly important, and often a lot of fun, the gains made by optimizing code can be easily dwarfed by I/O inefficiencies. Performing I/O usually takes orders of magnitude longer than performing in-memory processing tasks on the data. Many coders concentrate on what their objects are doing to the data and pay little attention to the environmental issues involved in acquiring and storing that data.
Table 1-1 lists some hypothetical times for performing a task on units of data read from and written to disk. The first column lists the average time it takes to process one unit of data, the second column is the amount of time it takes to move that unit of data from and to disk, and the third column is the number of these units of data that can be processed per second. The fourth column is the throughput increase that will result from varying the values in the first two columns.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
I/O Versus CPU Time
Most programmers fancy themselves software artists, crafting clever routines to squeeze a few bytes here, unrolling a loop there, or refactoring somewhere else to consolidate objects. While those things are undoubtedly important, and often a lot of fun, the gains made by optimizing code can be easily dwarfed by I/O inefficiencies. Performing I/O usually takes orders of magnitude longer than performing in-memory processing tasks on the data. Many coders concentrate on what their objects are doing to the data and pay little attention to the environmental issues involved in acquiring and storing that data.
Table 1-1 lists some hypothetical times for performing a task on units of data read from and written to disk. The first column lists the average time it takes to process one unit of data, the second column is the amount of time it takes to move that unit of data from and to disk, and the third column is the number of these units of data that can be processed per second. The fourth column is the throughput increase that will result from varying the values in the first two columns.
Table 1-1: Throughput rate, processing versus I/O time
Process time (ms)
I/O time (ms)
Throughput (units/sec)
Gain (%)
5
100
9.52
(benchmark)
2.5
100
9.76
2.44
1
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
No Longer CPU Bound
To some extent, Java programmers can be forgiven for their preoccupation with optimizing processing efficiency and not paying much attention to I/O considerations. In the early days of Java, the JVMs interpreted bytecodes with little or no runtime optimization. This meant that Java programs tended to poke along, running significantly slower than natively compiled code and not putting much demand on the I/O subsystems of the operating system.
But tremendous strides have been made in runtime optimization. Current JVMs run bytecode at speeds approaching that of natively compiled code, sometimes doing even better because of dynamic runtime optimizations. This means that most Java applications are no longer CPU bound (spending most of their time executing code) and are more frequently I/O bound (waiting for data transfers).
But in most cases, Java applications have not truly been I/O bound in the sense that the operating system couldn't shuttle data fast enough to keep them busy. Instead, the JVMs have not been doing I/O efficiently. There's an impedance mismatch between the operating system and the Java stream-based I/O model. The operating system wants to move data in large chunks (buffers), often with the assistance of hardware Direct Memory Access (DMA). The I/O classes of the JVM like to operate on small pieces — single bytes, or lines of text. This means that the operating system delivers buffers full of data that the stream classes of java.io spend a lot of time breaking down into little pieces, often copying each piece between several layers of objects. The operating system wants to deliver data by the truckload. The java.io classes want to process data by the shovelful. NIO makes it easier to back the truck right up to where you can make direct use of the data (a ByteBuffer object).
This is not to say that it was impossible to move large amounts of data with the traditional I/O model — it certainly was (and still is). The RandomAccessFile class in particular can be quite efficient if you stick to the array-based
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting to the Good Stuff
Most of the development effort that goes into operating systems is targeted at improving I/O performance. Lots of very smart people toil very long hours perfecting techniques for schlepping data back and forth. Operating-system vendors expend vast amounts of time and money seeking a competitive advantage by beating the other guys in this or that published benchmark.
Today's operating systems are modern marvels of software engineering (OK, some are more marvelous than others), but how can the Java programmer take advantage of all this wizardry and still remain platform-independent? Ah, yet another example of the TANSTAAFL principle.
The JVM is a double-edged sword. It provides a uniform operating environment that shelters the Java programmer from most of the annoying differences between operating-system environments. This makes it faster and easier to write code because platform-specific idiosyncrasies are mostly hidden. But cloaking the specifics of the operating system means that the jazzy, wiz-bang stuff is invisible too.
What to do? If you're a developer, you could write some native code using the Java Native Interface (JNI) to access the operating-system features directly. Doing so ties you to a specific operating system (and maybe a specific version of that operating system) and exposes the JVM to corruption or crashes if your native code is not 100% bug free. If you're an operating-system vendor, you could write native code and ship it with your JVM implementation to provide these features as a Java API. But doing so might violate the license you signed to provide a conforming JVM. Sun took Microsoft to court about this over the JDirect package which, of course, worked only on Microsoft systems. Or, as a last resort, you could turn to another language to implement performance-critical applications.
The java.nio package provides new abstractions to address this problem. The Channel and Selector classes in particular provide generic
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
I/O Concepts
The Java platform provides a rich set of I/O metaphors. Some of these metaphors are more abstract than others. With all abstractions, the further you get from hard, cold reality, the tougher it becomes to connect cause and effect. The NIO packages of JDK 1.4 introduce a new set of abstractions for doing I/O. Unlike previous packages, these are focused on shortening the distance between abstraction and reality. The NIO abstractions have very real and direct interactions with real-world entities. Understanding these new abstractions and, just as importantly, the I/O services they interact with, is key to making the most of I/O-intensive Java applications.
This book assumes that you are familiar with basic I/O concepts. This section provides a whirlwind review of some basic ideas just to lay the groundwork for the discussion of how the new NIO classes operate. These classes model I/O functions, so it's necessary to grasp how things work at the operating-system level to understand the new I/O paradigms.
In the main body of this book, it's important to understand the following topics:
  • Buffer handling
  • Kernel versus user space
  • Virtual memory
  • Paging
  • File-oriented versus stream I/O
  • Multiplexed I/O (readiness selection)
Buffers, and how buffers are handled, are the basis of all I/O. The very term "input/output" means nothing more than moving data in and out of buffers.
Processes perform I/O by requesting of the operating system that data be drained from a buffer (write) or that a buffer be filled with data (read). That's really all it boils down to. All data moves in or out of a process by this mechanism. The machinery inside the operating system that performs these transfers can be incredibly complex, but conceptually, it's very straightforward.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
This overview of system-level I/O is necessarily terse and incomplete. If you require more detailed information on the subject, consult a good reference — there are many available. A great place to start is the definitive operating-system textbook, Operating System Concepts, Sixth Edition, by my old boss Avi Silberschatz (John Wiley & Sons).
With the preceding overview, you should now have a pretty good idea of the subjects that will be covered in the following chapters. Armed with this knowledge, let's move on to the heart of the matter: Java New I/O (NIO). Keep these concrete ideas in mind as you acquire the new abstractions of NIO. Understanding these basic ideas should make it easy to recognize the I/O capabilities modeled by the new classes.
We're about to begin our Grand Tour of NIO. The bus is warmed up and ready to roll. Climb on board, settle in, get comfortable, and let's get this show on the road.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Buffers
It's all relative.
—Big Al Einstein
We begin our sightseeing tour of the java.nio packages with the Buffer classes. These classes are the foundation upon which java.nio is built. In this chapter, we'll take a close look at buffers, discover the various types, and learn how to use them. We'll then see how the java.nio buffers relate to the channel classes of java.nio.channels.
A Buffer object is a container for a fixed amount of data. It acts as a holding tank, or staging area, where data can be stored and later retrieved. Buffers are filled and drained, as we discussed in Chapter 1. There is one buffer class for each of the nonboolean primitive data types. Although buffers act upon the primitive data types they store, buffers have a strong bias toward bytes. Nonbyte buffers can perform translation to and from bytes behind the scenes, depending on how the buffer was created. We'll examine the implications of data storage within buffers later in this chapter.
Buffers work hand in glove with channels. Channels are portals through which I/O transfers take place, and buffers are the sources or targets of those data transfers. For outgoing transfers, data you want to send is placed in a buffer, which is passed to a channel. For inbound transfers, a channel deposits data in a buffer you provide. This hand-off of buffers between cooperating objects (usually objects you write and one or more Channel objects) is key to efficient data handling. Channels will be covered in detail in Chapter 3.
Figure 2-1 is a class diagram of the Buffer class-specialization hierarchy. At the top is the generic Buffer class. Buffer defines operations common to all buffer types, regardless of the data type they contain or special behaviors they may possess. This common ground will be our jumping-off point.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Buffer Basics
Conceptually, a buffer is an array of primitive data elements wrapped inside an object. The advantage of a Buffer class over a simple array is that it encapsulates data content and information about the data into a single object. The Buffer class and its specialized subclasses define a API for processing data buffers.
There are four attributes all buffers possess that provide information about the contained data elements. These are:
Capacity
The maximum number of data elements the buffer can hold. The capacity is set when the buffer is created and can never be changed.
Limit
The first element of the buffer that should not be read or written. In other words, the count of live elements in the buffer.
Position
The index of the next element to be read or written. The position is updated automatically by relative get( ) and put( ) methods.
Mark
A remembered position. Calling mark( ) sets mark = position. Calling reset( ) sets position = mark. The mark is undefined until set.
The following relationship between these four attributes always holds:
0 <= mark <= position <= limit <= capacity
Let's look at some examples of these attributes in action. Figure 2-2 shows a logical view of a newly created ByteBuffer with a capacity of 10.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Creating Buffers
As we saw in Figure 2-1, there are seven primary buffer classes, one for each of the nonboolean primitive data types in the Java language. (An eighth is shown there, MappedByteBuffer, which is a specialization of ByteBuffer used for memory mapped files. We'll discuss memory mapping in Chapter 3.) None of these classes can be instantiated directly. They are all abstract classes, but each contains static factory methods to create new instances of the appropriate class.
For this discussion, we'll use the CharBuffer class as an example, but the same applies to the other six primary buffer classes: IntBuffer, DoubleBuffer, ShortBuffer, LongBuffer, FloatBuffer, and ByteBuffer. Here are the key methods for creating buffers, common to all of the buffer classes (substitute class names as appropriate):
public abstract class CharBuffer
        extends Buffer implements CharSequence, Comparable
{
        // This is a partial API listing

        public static CharBuffer allocate (int capacity)

        public static CharBuffer wrap (char [] array)
        public static CharBuffer wrap (char [] array, int offset, int length)

        public final boolean hasArray(  )
        public final char [] array(  )
        public final int arrayOffset(  )
}
New buffers are created by either allocation or wrapping. Allocation creates a buffer object and allocates private space to hold capacity data elements. Wrapping creates a buffer object but does not allocate any space to hold the data elements. It uses the array you provide as backing storage to hold the data elements of the buffer.
To allocate a CharBuffer capable of holding 100 chars:
CharBuffer charBuffer = CharBuffer.allocate (100);
This implicitly allocates a char array from the heap to act as backing store for the 100 chars.
If you want to provide your own array to be used as the buffer's backing store, call the wrap( ) method:
char [] myArray = new char [100];
CharBuffer charbuffer = CharBuffer.wrap (myArray);
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Duplicating Buffers
As we just discussed, buffer objects can be created that describe data elements stored externally in an array. But buffers are not limited to managing external data in arrays. They can also manage data externally in other buffers. When a buffer that manages data elements contained in another buffer is created, it's known as a view buffer. Most view buffers are views of ByteBuffers (see Section 2.4.3). Before moving on to the specifics of byte buffers, we'll concentrate on the views that are common to all buffer types.
View buffers are always created by calling methods on an existing buffer instance. Using a factory method on an existing buffer instance means that the view object will be privy to internal implementation details of the original buffer. It will be able to access the data elements directly, whether they are stored in an array or by some other means, rather than going through the get( )/put( ) API of the original buffer object. If the original buffer is direct, views of that buffer will have the same efficiency advantages. Likewise for mapped buffers (discussed in Chapter 3).
In this section, we'll again use CharBuffer as an example, but the same operations can be done on any of the primary buffer types (see Figure 2-1).
public abstract class CharBuffer
        extends Buffer implements CharSequence, Comparable
{
        // This is a partial API listing

        public abstract CharBuffer duplicate(  );
        public abstract CharBuffer asReadOnlyBuffer(  );
        public abstract CharBuffer slice(  );
}
The duplicate( ) method creates a new buffer that is just like the original. Both buffers share the data elements and have the same capacity, but each buffer will have its own position, limit, and mark. Changes made to data elements in one buffer will be reflected in the other. The duplicate buffer has the same view of the data as the original buffer. If the original buffer is read-only, or direct, the new buffer will inherit those attributes. Direct buffers are discussed in Section 2.4.2.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Byte Buffers
In this section, we'll take a closer look at byte buffers. There are buffer classes for all the primitive data types (except boolean), but byte buffers have characteristics not shared by the others. Bytes are the fundamental data unit used by the operating system and its I/O facilities. When moving data between the JVM and the operating system, it's necessary to break down the other data types into their constituent bytes. As we'll see in the following sections, the byte-oriented nature of system-level I/O can be felt throughout the design of buffers and the services with which they interact.
For reference, here is the complete API of ByteBuffer. Some of these methods have been discussed in previous sections and are simply type-specific versions. The new methods will be covered in this and following sections.
package java.nio;

public abstract class ByteBuffer extends Buffer
        implements Comparable
{
        public static ByteBuffer allocate (int capacity)
        public static ByteBuffer allocateDirect (int capacity)
        public abstract boolean isDirect(  );
        public static ByteBuffer wrap (byte[] array, int offset, int length)
        public static ByteBuffer wrap (byte[] array)

        public abstract ByteBuffer duplicate(  );
        public abstract ByteBuffer asReadOnlyBuffer(  );
        public abstract ByteBuffer slice(  );
        public final boolean hasArray(  )
        public final byte [] array(  )
        public final int arrayOffset(  )

        public abstract byte get(  );
        public abstract byte get (int index);
        public ByteBuffer get (byte[] dst, int offset, int length)
        public ByteBuffer get (byte[] dst, int offset, int length)

        public abstract ByteBuffer put (byte b);
        public abstract ByteBuffer put (int index, byte b);
        public ByteBuffer put (ByteBuffer src)
        public ByteBuffer put (byte[] src, int offset, int length)
        public final ByteBuffer put (byte[] src)

        public final ByteOrder order(  )
        public final ByteBuffer order (ByteOrder bo)

        public abstract CharBuffer asCharBuffer(  );
        public abstract ShortBuffer asShortBuffer(  );
        public abstract IntBuffer asIntBuffer(  );
        public abstract LongBuffer asLongBuffer(  );
        public abstract FloatBuffer asFloatBuffer(  );
        public abstract DoubleBuffer asDoubleBuffer(  );

        public abstract char getChar(  );
        public abstract char getChar (int index);
        public abstract ByteBuffer putChar (char value);
        public abstract ByteBuffer putChar (int index, char value);

        public abstract short getShort(  );
        public abstract short getShort (int index);
        public abstract ByteBuffer putShort (short value);
        public abstract ByteBuffer putShort (int index, short value);

        public abstract int getInt(  );
        public abstract int getInt (int index);
        public abstract ByteBuffer putInt (int value);
        public abstract ByteBuffer putInt (int index, int value);

        public abstract long getLong(  );
        public abstract long getLong (int index);
        public abstract ByteBuffer putLong (long value);
        public abstract ByteBuffer putLong (int index, long value);

        public abstract float getFloat(  );
        public abstract float getFloat (int index);
        public abstract ByteBuffer putFloat (float value);
        public abstract ByteBuffer putFloat (int index, float value);

        public abstract double getDouble(  );
        public abstract double getDouble (int index);
        public abstract ByteBuffer putDouble (double value);
        public abstract ByteBuffer putDouble (int index, double value);

        public abstract ByteBuffer compact(  );
        public boolean equals (Object ob) {
        public int compareTo (Object ob) {
        public String toString(  )
        public int hashCode(  )
}  
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
This chapter covered buffers, which live in the java.nio package. Buffer objects enable the advanced I/O capabilities covered in the remaining chapters. These key buffer topics were covered in this chapter:
Buffer attributes
Attributes that all buffers posses were covered in Section 2.1.1. These attributes describe the current state of a buffer and affect how it behaves. In this section, we also learned how to manipulate the state of buffers and how to add and remove data elements.
Buffer creation
We learned how buffers are created in Section 2.2 and how to duplicate them in Section 2.3. There are many types of buffers. The way a buffer is created determines how and where it should be used.
Byte buffers
While buffers can be created for any primitive data type other than boolean, byte buffers have special features not shared by the other buffer types. Only byte buffers can be used with channels (discussed in Chapter 3), and byte buffers offer views of their content in terms of other data types. We also examined the issues related to byte ordering. ByteBuffers were discussed in Section 2.4.
This concludes our visit with the menagerie of buffers in java.nio. The next stop on the tour is java.nio.channels where you will encounter, not surprisingly, channels. Channels interact with byte buffers and unlock the door to high-performance I/O. Hop back on the bus, it's a short trip to our next stop.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Channels
Brilliance! Sheer, unadulterated brilliance!
—Wile E. Coyote, Super Genius
Channels are the second major innovation of java.nio. They are not an extension or enhancement, but a new, first-class Java I/O paradigm. They provide direct connections to I/O services. A Channel is a conduit that transports data efficiently between byte buffers and the entity on the other end of the channel (usually a file or socket).
A good metaphor for a channel is a pneumatic tube, the type used at drive-up bank-teller windows. Your paycheck would be the information you're sending. The carrier would be like a buffer. You fill the buffer (place your paycheck in the carrier), "write" the buffer to the channel (drop the carrier into the tube), and the payload is carried to the I/O service (bank teller) on the other end of the channel.
The response would be the teller filling the buffer (placing your receipt in the carrier) and starting a channel transfer in the opposite direction (dropping the carrier back into the tube). The carrier arrives on your end of the channel (a filled buffer is ready for you to examine). You then flip the buffer (open the lid) and drain it (remove your receipt). You drive away and the next object (bank customer) is ready to repeat the process using the same carrier (Buffer) and tube (Channel) objects.
In most cases, channels have a one-to-one relationship with operating-system file descriptors, or file handles. Although channels are more generalized than file descriptors, most channels you will use on a regular basis are connected to open file descriptors. The channel classes provide the abstraction needed to maintain platform independence but still model the native I/O capabilities of modern operating systems.
Channels are gateways through which the native I/O services of the operating system can be accessed with a minimum of overhead, and buffers are the internal endpoints used by channels to send and receive data. (See Figure 3-1.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Channel Basics
First, let's take a closer look at the basic Channel interface. This is the full source of the Channel interface:
package java.nio.channels;

public interface Channel
{
        public boolean isOpen(  );
        public void close(  ) throws IOException;
}
Unlike buffers, the channel APIs are primarily specified by interfaces. Channel implementations vary radically between operating systems, so the channel APIs simply describe what can be done. Channel implementations often use native code, so this is only natural. The channel interfaces allow you to gain access to low-level I/O services in a controlled and portable way.
As you can see by the top-level Channel interface, there are only two operations common to all channels: checking to see if a channel is open (isOpen( )) and closing an open channel (close( )). Figure 3-2 shows that all the interesting stuff is in the classes that implement Channel and its subinterfaces.
The InterruptibleChannel interface is a marker that, when implemented by a channel, indicates that the channel is interruptible. Interruptible channels behave in specific ways when a thread accessing them is interrupted, which we will discuss in Section 3.1.3. Most, but not all, channels are interruptible.
The other interfaces extending Channel are the byte-oriented subinterfaces Writable-ByteChannel and ReadableByteChannel. This supports what we learned earlier: channels operate only on byte buffers. The structure of the hierarchy implies that channels for other data types could also extend from Channel. This is good class design, but nonbyte implementations are unlikely because operating systems do low-level I/O in terms of bytes.
You can see in Figure 3-2 that two of the classes in this family tree live in a different package, java.nio.channels.spi. These classes, AbstractInterruptibleChannel and
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Scatter/Gather
Channels provide an important new capability known as scatter/gather (referred to in some circles as vectored I/O). Scatter/gather is a simple yet powerful concept (see Section 1.4.1.1). It refers to performing a single I/O operation across multiple buffers. For a write operation, data is gathered (drained) from several buffers in turn and sent along the channel. The buffers do not need to have the same capcity (and they usually don't). The effect is the same as if the content of all the buffers was concatenated into one large buffer before being sent. For reads, the data read from the channel is scattered to multiple buffers in sequence, filling each to its limit, until the data from the channel or the total buffer space is exhausted.
Most modern operating systems support native vectored I/O. When you request a scatter/gather operation on a channel, the request will be translated into appropriate native calls to fill or drain the buffers directly. This is a big win, because buffer copies and system calls are reduced or eliminated. Scatter/gather should be used with direct ByteBuffers to gain the greatest advantage from native I/O, especially if the buffers are long-lived.
Adding the scatter/gather interfaces to the UML class diagram of Figure 3-3 produces Figure 3-4. The following code illustrates how scatter is an extension of reading and gather is built on writing:
public interface ScatteringByteChannel
        extends ReadableByteChannel
{
        public long read (ByteBuffer [] dsts)
                throws IOException;

        public long read (ByteBuffer [] dsts, int offset, int length)
                throws IOException;
}


public interface GatheringByteChannel
        extends WritableByteChannel
{
        public long write(ByteBuffer[] srcs)
                throws IOException;

        public long write(ByteBuffer[] srcs, int offset, int length)
                throws IOException;
}
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
File Channels
Up to this point, we've been discussing the channels generically, i.e., discussing those things common to all channel types. It's time to get specific. In this section, we discuss file channels (socket channels are covered in an upcoming section). As you can see in Figure 3-7, the FileChannel class can do normal read and write as well as scatter/gather. It also provides lots of new methods specific to files. Many of these methods are familiar file operations; others may be new to you. We'll discuss them all, right here, right now.
Figure 3-7: FileChannel family tree
File channels are always blocking and cannot be placed into nonblocking mode. Modern operating systems have sophisticated caching and prefetch algorithms that usually give local disk I/O very low latency. Network filesystems generally have higher latencies but often benefit from the same optimizations. The nonblocking paradigm of stream-oriented I/O doesn't make as much sense for file-oriented operations because of the fundamentally different nature of file I/O. For file I/O, the true winner is asynchronous I/O, which lets a process request one or more I/O operations from the operating system but does not wait for them to complete. The process is notified at a later time that the requested I/O has completed. Asynchronous I/O is an advanced capability not available on many operating systems. It is under consideration as a future NIO enhancement.
As mentioned in Section 3.1.1, FileChannel objects cannot be created directly. A FileChannel instance can be obtained only by calling getChannel( ) on an open file object ( RandomAccessFile, FileInputStream, or FileOutputStream). Calling the getChannel( ) method returns a FileChannel object connected to the same file, with the same access permissions as the file object. You can then use the channel object to make use of the powerful
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Memory-Mapped Files
The new FileChannel class provides a method, map( ), that establishes a virtual memory mapping between an open file and a special type of ByteBuffer. (Memory-mapped files and how they interact with virtual memory were summarized in Chapter 1.) Calling map( ) on a FileChannel creates a virtual memory mapping backed by a disk file and wraps a MappedByteBuffer object around that virtual memory space. (See Figure 1-6.)
The MappedByteBuffer object returned from map( ) behaves like a memory-based buffer in most respects, but its data elements are stored in a file on disk. Calling get( ) will fetch data from the disk file, and this data reflects the current content of the file, even if the file has been modified by an external process since the mapping was established. The data visible through a file mapping is exactly the same as you would see by reading the file conventionally. Likewise, doing a put( ) to the mapped buffer will update the file on disk (assuming you have write permission), and your changes will be visible to other readers of the file.
Accessing a file through the memory-mapping mechanism can be far more efficient than reading or writing data by conventional means, even when using channels. No explicit system calls need to be made, which can be time-consuming. More importantly, the virtual memory system of the operating system automatically caches memory pages. These pages will be cached using system memory and will not consume space from the JVM's memory heap.
Once a memory page has been made valid (brought in from disk), it can be accessed again at full hardware speed without the need to make another system call to get the data. Large, structured files that contain indexes or other sections that are referenced or updated frequently can benefit tremendously from memory mapping. When combined with file locking to protect critical sections and control transactional atomicity, you begin to see how memory mapped buffers can be put to good use.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Socket Channels
Let's move on to the channel classes that model network sockets. Socket channels have different characteristics than file channels.
The new socket channels can operate in nonblocking mode and are selectable. These two capabilities enable tremendous scalability and flexibility in large applications, such as web servers and middleware components. As we'll see in this section, it's no longer necessary to dedicate a thread to each socket connection (and suffer the context-switching overhead of managing large numbers of threads). Using the new NIO classes, one or a few threads can manage hundreds or even thousands of active socket connections with little or no performance loss.
You can see in Figure 3-9 that all three of the socket channel classes (DatagramChannel, SocketChannel, and ServerSocketChannel) extend from AbstractSelectableChannel, which lives in the java.nio.channels.spi package. This means that it's possible to perform readiness selection of socket channels using a Selector object. Selection and multiplexed I/O are discussed in Chapter 4.
Figure 3-9: The socket channel family tree
Notice that DatagramChannel and SocketChannel implement the interfaces that define read and write capabilities, but ServerSocketChannel does not. ServerSocketChannel listens for incoming connects and creates new SocketChannel objects. It never transfers any data itself.
Before discussing the individual types of socket channels, you should understand the relationship between sockets and socket channels. As described earlier, a channel is a conduit to an I/O service and provides methods for interacting with that service. In the case of sockets, the decision was made not to reimplement the socket protocol APIs in the corresponding channel classes. The preexisting socket channels in java.net are reused for most protocol operations.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Pipes
The java.nio.channels package includes a class named Pipe. A pipe, in the general sense, is a conduit through which data can be passed in a single direction between two entities. The notion of a pipe has long been familiar to users of Unix (and Unix-like) operating systems. Pipes are used on Unix systems to connect the output of one process to the input of another. The Pipe class implements a pipe paradigm, but the pipes it creates are intraprocess (within the JVM process) rather than interprocess (between processes). See Figure 3-10.
Figure 3-10: The Pipe family tree
The Pipe class creates a pair of Channel objects that provide a loopback mechanism. The two channels' far ends are connected so that whatever is written down the SinkChannel appears on the SourceChannel . Figure 3-11 shows the class hierarchy for Pipe.
package java.nio.channels;

public abstract class Pipe
{
        public static Pipe open(  ) throws IOException
        public abstract SourceChannel source(  );
        public abstract SinkChannel sink(  );

        public static abstract class SourceChannel
                 extends AbstractSelectableChannel
                 implements ReadableByteChannel, ScatteringByteChannel

        public static abstract class SinkChannel
                 extends AbstractSelectableChannel
                 implements WritableByteChannel, GatheringByteChannel
}
Figure 3-11: A pipe is a pair of looped channels
An instance of Pipe is created by invoking the Pipe.open( ) factory method with no arguments. The Pipe class defines two nested channel classes to implement the pipeline. These classes are Pipe.SourceChannel (the read end of the pipe) and
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Channels Utility Class
NIO channels provide a new, stream-like I/O metaphor, but the familiar byte stream and character reader/writer classes are still around and widely used. Channels may eventually be retrofitted into the java.io classes (an implementation detail), but the APIs presented by java.io streams and reader/writers will not be going away anytime soon (nor should they).
A utility class, with the slightly repetitive name of java.nio.channels.Channels, defines several static factory methods to make it easier for channels to interconnect with streams and readers/writers. Table 3-2 summarizes these methods.
Table 3-2: Summary of java.nio.channels.Channels utility methods
Method
Returns
Description
newChannel (InputStream in)
ReadableByteChannel
Returns a channel that will read bytes from the provided input stream.
newChannel (OutputStream out)
WritableByteChannel
Returns a channel that will write bytes to the provided output stream.
newInputStream (ReadableByteChannel ch)
InputStream
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
We covered a lot of ground in this chapter. Channels make up the infrastructure, or the plumbing, which carries data between ByteBuffers and I/O services of the operating system (or whatever the channel is connected to). The key concepts discussed in this chapter were:
Basic channel operations
In Section 3.1, we learned the basic operations of channels. These included how to open a channel using the API calls common to all channels and how to close a channel when finished.
Scatter/gather channels
The topic of scatter/gather I/O using channels was introduced in Section 3.2. Vectored I/O enables you to perform one I/O operation across multiple buffers automatically.
File channels
The multifaceted FileChannel class was discussed in Section 3.3. This powerful new channel provides access to advanced file operations not previously available to Java programs. Among these new capabilities are file locking, memory-mapped files, and channel-to-channel transfers.
Socket channels
The several types of socket channels were covered in Section 3.5. Also discussed was nonblocking mode, an important new feature supported by socket channels.
Pipes
In Section 3.6, we looked at the Pipe class, a useful new loopback mechanism using specialized channel implementations.
Channels utility class
The Channels class contains utility methods that provide for cross-connecting channels with conventional byte streams and character reader/writer objects. See Section 3.7.
There are many channels on your NIO
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Selectors
Life is a series of rude awakenings.
—R. Van Winkle
In this chapter, we'll explore selectors. Selectors provide the ability to do readiness selection, which enables multiplexed I/O. As described in Chapter 1, readiness selection and multiplexing make it possible for a single thread to efficiently manage many I/O channels simultaneously. C/C++ coders have had the POSIX select( ) and/or poll( ) system calls in their toolbox for many years. Most other operating systems provide similar functionality. But readiness selection was never available to Java programmers until JDK 1.4. Programmers whose primary body of experience is in the Java environment may not have encountered this I/O model before.
For an illustration of readiness selection, let's return to the drive-through bank example of Chapter 3. Imagine a bank with three drive-through lanes. In the traditional (nonselector) scenario, imagine that each drive-through lane has a pneumatic tube that runs to its own teller station inside the bank, and each station is walled off from the others. This means that each tube (channel) requires a dedicated teller (worker thread). This approach doesn't scale well and is wasteful. For each new tube (channel) added, a new teller is required, along with associated overhead such as tables, chairs, paper clips (memory, CPU cycles, context switching), etc. And when things are slow, these resources (which have associated costs) tend to sit idle.
Now imagine a different scenario in which each pneumatic tube (channel) is connected to a single teller station inside the bank. The station has three slots where the carriers (data buffers) arrive, each with an indicator (selection key) that lights up when the carrier is in the slot. Also imagine that the teller (worker thread) has a sick cat and spends as much time as possible reading Do It Yourself Taxidermy. At the end of each paragraph, the teller glances up at the indicator lights (invokes select( )) to determine if any of the channels are ready (readiness selection). The teller (worker thread) can perform another task while the drive-through lanes (channels) are idle yet still respond to them in a timely manner when they require attention.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Selector Basics
Getting a handle on the topics discussed in this chapter will be somewhat tougher than understanding the relatively straightforward buffer and channel classes. It's trickier, because there are three main classes, all of which come into play at the same time. If you find yourself confused, back up and take another run at it. Once you see how the pieces fit together and their individual roles, it should all make sense.
We'll begin with the executive summary, then break down the details. You register one or more previously created selectable channels with a selector object. A key that represents the relationship between one channel and one selector is returned. Selection keys remember what you are interested in for each channel. They also track the operations of interest that their channel is currently ready to perform. When you invoke select( ) on a selector object, the associated keys are updated by checking all the channels registered with that selector. You can obtain a set of the keys whose channels were found to be ready at that point. By iterating over these keys, you can service each channel that has become ready since the last time you invoked select( ).
That's the 30,000-foot view. Now let's swoop in low and see what happens at ground level (or below).
At this point, you may want to skip ahead to Example 4-1 and take a quick look at the code. Between here and there, you'll learn the specifics of how these new classes work, but armed with just the high-level information in the preceding paragraph, you should be able to see how the selection model works in practice.
At the most fundamental level, selectors provide the capability to ask a channel if it's ready to perform an I/O operation of interest to you. For example, a SocketChannel object could be asked if it has any bytes ready to read, or we may want to know if a ServerSocketChannel has any incoming connections ready to accept.
Selectors provide this service when used in conjunction with SelectableChannel objects, but there's more to the story than that. The real power of readiness selection is that a potentially large number of channels can be checked for readiness simultaneously. The caller can easily determine which of several channels are ready to go. Optionally, the invoking thread can ask to be put to sleep until one or more of the channels registered with the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using Selection Keys
Let's look again at the API of the SelectionKey class:
package java.nio.channels;

public abstract class SelectionKey
{
        public static final int OP_READ
        public static final int OP_WRITE
        public static final int OP_CONNECT
        public static final int OP_ACCEPT

        public abstract SelectableChannel channel(  );
        public abstract Selector selector(  );

        public abstract void cancel(  );
        public abstract boolean isValid(  );

        public abstract int interestOps(  );
        public abstract void interestOps (int ops);
        public abstract int readyOps(  );

        public final boolean isReadable(  )
        public final boolean isWritable(  )
        public final boolean isConnectable(  )
        public final boolean isAcceptable(  )

        public final Object attach (Object ob)
        public final Object attachment(  )
}
As mentioned earlier, a key represents the registration of a particular channel object with a particular selector object. You can see that relationship reflected in the first two methods above. The channel( ) method returns the SelectableChannel object associated with the key, and selector( ) returns the associated Selector object. Nothing surprising there.
Key objects represent a specific registration relationship. When it's time to terminate that relationship, call the cancel( ) method on the SelectionKey object. A key can be checked to see if it still represents a valid registration by calling its isValid( ) method. When a key is cancelled, it's placed in the cancelled set of the associated selector. The registration is not immediately terminated, but the key is immediately invalidated (see Section 4.3). Upon the next invocation of select( ) (or upon completion of an in-progress select( ) invocation), any cancelled keys will be cleared from the cancelled key set, and the corresponding deregistrations will be completed. The channel can be reregistered, and a new
Additional content appearing in this section has been removed.
Purchase this book now or