Chapter 1. Introduction
Generics and collections work well with a number of other new features introduced in the latest versions of Java, including boxing and unboxing, a new form of loop, and functions that accept a variable number of arguments. We begin with an example that illustrates all of these. As we shall see, combining them is synergistic: the whole is greater than the sum of its parts.
Taking that as our motto, let’s do something simple with sums: put three numbers into a list and add them together. Here is how to do it in Java with generics:
List<Integer> ints = Arrays.asList(1,2,3); int s = 0; for (int n : ints) { s += n; } assert s == 6;
You can probably read this code without much explanation, but let’s
touch on the key features. The interface List
and the class Arrays
are part of the Collections Framework (both
are found in the package java.util
). The
type List
is now
generic; you write List<E>
to indicate a list with elements of
type E
. Here we write List<Integer>
to indicate that the elements
of the list belong to the class Integer
,
the wrapper class that corresponds to the primitive type int
. Boxing and unboxing operations, used to
convert from the primitive type to the wrapper class, are automatically
inserted. The static method asList
takes
any number of arguments, places them into an array, and returns a new list
backed by the array. The new loop form, foreach, is
used to bind a variable successively to each element of the list, and the
loop body adds these into the sum. The assertion statement (introduced in
Java 1.4), is used to check that the sum is correct; when
assertions are enabled, it throws an error if the condition does not
evaluate to true
.
Here is how the same code looks in Java before generics:
List ints = Arrays.asList( new Integer[] { new Integer(1), new Integer(2), new Integer(3) } ); int s = 0; for (Iterator it = ints.iterator(); it.hasNext(); ) { int n = ((Integer)it.next()).intValue(); s += n; } assert s == 6;
Reading this code is not quite so easy. Without generics, there is no way to indicate in the type declaration what kind of elements you intend to store in
the list, so instead of writing List<Integer>
, you write List
. Now it is the coder rather than the compiler
who is responsible for remembering the type of the list elements, so you
must write the cast to (Integer
) when
extracting elements from the list. Without boxing and unboxing, you must explicitly allocate each object belonging
to the wrapper class Integer
and use the
intValue
method to extract the
corresponding primitive int
. Without
functions that accept a variable number of arguments, you must explicitly
allocate an array to pass to the asList
method. Without the new form of loop, you must explicitly declare an
iterator and advance it through the list.
By the way, here is how to do the same thing with an array in Java before generics:
int[] ints = new int[] { 1,2,3 }; int s = 0; for (int i = 0; i < ints.length; i++) { s += ints[i]; } assert s == 6;
This is slightly longer than the corresponding code that uses generics and collections, is arguably a bit less readable, and is certainly less flexible. Collections let you easily grow or shrink the size of the collection, or switch to a different representation when appropriate, such as a linked list or hash table or ordered tree. The introduction of generics, boxing and unboxing, foreach loops, and varargs in Java marks the first time that using collections is just as simple, perhaps even simpler, than using arrays.
Now let’s look at each of these features in a little more detail.
Generics
An interface or class may be declared to take one or more type parameters, which are written in angle brackets and should be supplied when you declare a variable belonging to the interface or class or when you create a new instance of a class.
We saw one example in the previous section. Here is another:
List<String>
words = new ArrayList<String>
(); words.add("Hello "); words.add("world!"); String s = words.get(0)+words.get(1); assert s.equals("Hello world!");
In the Collections Framework, class ArrayList<E>
implements interface List<E>
. This trivial code fragment
declares the variable words
to contain
a list of strings, creates an instance of an ArrayList
, adds two strings to the list, and
gets them out again.
In Java before generics, the same code would be written as follows:
List words = new ArrayList(); words.add("Hello "); words.add("world!"); String s = ((String)
words.get(0))+((String)
words.get(1)) assert s.equals("Hello world!");
Without generics, the type parameters are omitted, but you must explicitly cast whenever an element is extracted from the list.
In fact, the bytecode compiled from the two sources above will be
identical. We say that generics are implemented by
erasure because the types List<Integer>,
List<String>
, and List<List<String>>
are all
represented at run-time by the same type, List
. We also use erasure
to describe the process that converts the first program to the second. The
term erasure is a slight misnomer, since the process
erases type parameters but adds casts.
Generics implicitly perform the same cast that is explicitly performed without generics. If such casts could fail, it might be hard to debug code written with generics. This is why it is reassuring that generics come with the following guarantee:
Cast-iron guarantee: the implicit casts added by the compilation of generics never fail. |
There is also some fine print on this guarantee: it applies only when no unchecked warnings have been issued by the compiler. Later, we will discuss at some length what causes unchecked warnings to be issued and how to minimize their effect.
Implementing generics by erasure has a number of important effects.
It keeps things simple, in that generics do not add anything fundamentally
new. It keeps things small, in that there is exactly one implementation of
List
, not one version for each type.
And it eases evolution, since the same library can be accessed in both nongeneric and generic
forms.
This last point is worth some elaboration. It means that you don’t get nasty problems due to maintaining two versions of the libraries: a nongeneric legacy version that works with Java 1.4 or earlier, and a generic version that works with Java 5 and 6. At the bytecode level, code that doesn’t use generics looks just like code that does. There is no need to switch to generics all at once—you can evolve your code by updating just one package, class, or method at a time to start using generics. We even explain how you may declare generic types for legacy code. (Of course, the cast-iron guarantee mentioned above holds only if you add generic types that match the legacy code.)
Another consequence of implementing generics by erasure is that array types differ in key ways from parameterized types. Executing
new String[size]
allocates an array, and stores in that array an indication that its
components are of type String
. In
contrast, executing:
new ArrayList<String>()
allocates a list, but does not store in the list any indication of the type of its elements. In the jargon, we say that Java reifies array component types but does not reify list element types (or other generic types). Later, we will see how this design eases evolution (see Chapter 5) but complicates casts, instance tests, and array creation (see Chapter 6).
Generics Versus Templates Generics in Java resemble templates in C++. There are just two important things to bear in mind about the relationship between Java generics and C++ templates: syntax and semantics. The syntax is deliberately similar and the semantics are deliberately different.
Syntactically, angle brackets were chosen because they are familiar to C++ users, and because square brackets would be hard to parse. However, there is one difference in syntax. In C++, nested parameters require extra spaces, so you see things like this:
List< List<String> >
In Java, no spaces are required, and it’s fine to write this:
List<List<String>>
You may use extra spaces if you prefer, but they’re not required. (In C++, a problem arises because >> without the space denotes the right-shift operator. Java fixes the problem by a trick in the grammar.)
Semantically, Java generics are defined by erasure, whereas C++ templates are defined by expansion. In C++ templates, each instance of a template at a new type is compiled separately. If you use a list of integers, a list of strings, and a list of lists of string, there will be three versions of the code. If you use lists of a hundred different types, there will be a hundred versions of the code—a problem known as code bloat. In Java, no matter how many types of lists you use, there is always one version of the code, so bloat does not occur.
Expansion may lead to more efficient implementation than erasure,
since it offers more opportunities for optimization, particularly for
primitive types such as int
. For code
that is manipulating large amounts of data—for instance, large arrays in
scientific computing—this difference may be significant. However, in
practice, for most purposes the difference in efficiency is not important,
whereas the problems caused by code bloat can be crucial.
In C++, you also may instantiate a template with a constant value rather than a type, making it possible to use templates as a sort of “macroprocessor on steroids” that can perform arbitrarily complex computations at compile time. Java generics are deliberately restricted to types, to keep them simple and easy to understand.
Boxing and Unboxing
Recall that every type in Java is either a
reference type or a primitive
type. A reference type is any class, interface, or array type. All
reference types are subtypes of class Object
, and any variable of reference type may
be set to the value null
. As shown in
the following table, there are eight primitive types, and each of these has a corresponding
library class of reference type. The library classes are located in the
package java.lang
.
Primitive | Reference |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Conversion of a primitive type to the corresponding reference type is called boxing and conversion of the reference type to the corresponding primitive type is called unboxing.
Java with generics automatically inserts boxing and unboxing
coercions where appropriate. If an expression e
of type int
appears where a value of type Integer
is expected, boxing converts it to new
Integer(e)
(however, it may cache frequently occurring values).
If an expression e
of type Integer
appears where a value of type int
is expected, unboxing converts it to the
expression e.intValue()
. For example,
the sequence:
List<Integer> ints = new ArrayList<Integer>(); ints.add(1); int n = ints.get(0);
is equivalent to the sequence:
List<Integer> ints = new ArrayList<Integer>(); ints.add(Integer.valueOf(1)
); int n = ints.get(0).intValue();
The call Integer.valueOf(1)
is similar in effect
to the expression new Integer(1)
, but may cache some
values for improved performance, as we explain shortly.
Here, again, is the code to find the sum of a list of integers, conveniently packaged as a static method:
public static int sum (List<Integer> ints) { int s = 0; for (int n : ints) { s += n; } return s; }
Why does the argument have type List<Integer>
and not List<int>
? Because type parameters must always be bound to reference types, not primitive types. Why does the result have type int
and not Integer
? Because result types may be either
primitive or reference types, and it is more efficient to use the former
than the latter. Unboxing occurs when each Integer
in the list ints
is bound to the variable n
of type int
.
We could rewrite the method, replacing each occurrence of int
with Integer
:
public static Integer sumInteger(List<Integer> ints) { Integer s = 0; for (Integer n : ints) { s += n; } return s; }
This code compiles but performs a lot of needless work. Each
iteration of the loop unboxes the values in s
and n
,
performs the addition, and boxes up the result again. With Sun’s current
compiler, measurements show that this version is about 60 percent slower
than the original.
Look Out for This! One subtlety of
boxing and unboxing is that == is defined differently on primitive and on reference
types. On type int
, it is defined by
equality of values, and on type Integer
, it is defined by object identity. So
both of the following assertions succeed using Sun’s JVM:
List<Integer> bigs = Arrays.asList(100,200,300);
assert sumInteger(bigs) == sum(bigs);
assert sumInteger(bigs) != sumInteger(bigs); // not recommended
In the first assertion, unboxing causes values to be compared, so
the results are equal. In the second assertion, there is no unboxing, and
the two method calls return distinct Integer
objects, so the results are unequal even
though both Integer
objects represent
the same value, 600.We recommend that you never use == to compare values
of type Integer
. Either unbox first, so
== compares values of type int
, or else
use equals
to compare values of type
Integer
.
A further subtlety is that boxed values may be cached. Caching is required when boxing an int
or short
value between–128 and 127, a char
value
between '\u0000'
and '\u007f'
, a byte
, or a boolean
; and caching is permitted when boxing
other values. Hence, in contrast to our earlier example, we have the
following:
List<Integer> smalls = Arrays.asList(1,2,3);
assert sumInteger(smalls) == sum(smalls);
assert sumInteger(smalls) == sumInteger(smalls); // not recommended
This is because 6 is smaller than 128, so boxing the value 6 always
returns exactly the same object. In general, it is not specified whether
boxing the same value twice should return identical or distinct objects,
so the inequality assertion shown earlier may either fail or
succeed depending on the implementation. Even for small values, for which
== will compare values of type Integer
correctly, we recommend against its use. It is clearer and cleaner to use
equals
rather than == to compare values
of reference type, such as Integer
or
String
.
Foreach
Here, again, is our code that computes the sum of a list of integers.
List<Integer> ints = Arrays.asList(1,2,3);
int s = 0;
for (int n : ints) { s += n; }
assert s == 6;
The loop in the third line is called a foreach
loop even though it is written with the keyword for
. It is equivalent to the following:
for
(Iterator<Integer> it =ints.
iterator(); it.hasNext(); ) {int n
= it.next();s += n;
}
The emphasized code corresponds to what was written by the user, and
the unemphasized code is added in a systematic way by the compiler. It
introduces the variable it
of type
Iterator<Integer>
to iterate over
the list ints
of type List<Integer>
. In general, the compiler
invents a new name that is guaranteed not to clash with any name already
in the code. Note that unboxing occurs when the expression it.next()
of type Integer
is assigned to the variable n
of type int
.
The foreach loop can be applied to any object
that implements the interface Iterable<E>
(in package java.lang
), which in turn refers to the
interface Iterator<E>
(in package
java.util
). These define the methods
iterator, hasNext
, and next
, which are used by the translation of the
foreach loop (iterators also have a method remove
, which is not used by the
translation):
interface Iterable<E> { public Iterator<E> iterator(); } interface Iterator<E> { public boolean hasNext(); public E next(); public void remove(); }
All collections, sets, and lists in the Collections Framework
implement the Iterable<E>
interface; and classes defined by other vendors or users may implement it
as well.
The foreach loop may also be applied to an array:
public static int sumArray(int[] a) { int s = 0; for (int n : a) { s += n; } return s; }
The foreach loop was deliberately kept simple and catches only the most
common case. You need to explicitly introduce an iterator if you wish to use the remove
method or to iterate over more than one list in parallel.
Here is a method that removes negative elements from a list of
doubles:
public static void removeNegative(List<Double> v) { for (Iterator<Double> it = v.iterator(); it.hasNext();) { if (it.next() < 0) it.remove(); } }
Here is a method to compute the dot product of two vectors, represented as lists of doubles, both of the same length. Given two vectors, u1, … , un and v1, … , vn, it computes u1 * v1> + … + un * vn:
public static double dot(List<Double> u, List<Double> v) { if (u.size() != v.size()) throw new IllegalArgumentException("different sizes"); double d = 0; Iterator<Double> uIt = u.iterator(); Iterator<Double> vIt = v.iterator(); while (uIt.hasNext()) { assert uIt.hasNext() && vIt.hasNext(); d += uIt.next() * vIt.next(); } assert !uIt.hasNext() && !vIt.hasNext(); return d; }
Two iterators, uIt
and vIt
, advance across the lists u
and v
in
lock step. The loop condition checks only the first iterator, but the
assertions confirm that we could have used the second iterator instead,
since we previously tested both lists to confirm that they have the same
length.
Generic Methods and Varargs
Here is a method that accepts an array of any type and converts it to a list:
class Lists {
public static <T>
List<T> toList(T[] arr) {
List<T> list = new ArrayList<T>();
for (T elt : arr) list.add(elt);
return list;
}
}
The static method toList
accepts an
array of type T[]
and returns a list of
type List<T>
, and does so for
any type T
. This
is indicated by writing <T>
at
the beginning of the method signature, which declares T
as a new type variable. A method which
declares a type variable in this way is called a generic
method. The scope of the type variable T
is local to the method itself; it may appear
in the method signature and the method body, but not outside the method.
The method may be invoked as follows:
List<Integer> ints = Lists.toList(new Integer[] { 1, 2, 3 }); List<String> words = Lists.toList(new String[] { "hello", "world" });
In the first line, boxing converts 1, 2, 3
from int
to Integer
.
Packing the arguments into an array is cumbersome. The
vararg feature permits a special, more convenient
syntax for the case in which the last argument of a method is an array. To
use this feature, we replace T[]
with
T…
in the method declaration:
class Lists {
public static <T> List<T> toList(T...
arr) {
List<T> list = new ArrayList<T>();
for (T elt : arr) list.add(elt);
return list;
}
}
Now the method may be invoked as follows:
List<Integer> ints = Lists.toList(1, 2, 3); List<String> words = Lists.toList("hello", "world");
This is just shorthand for what we wrote above. At run time, the arguments are packed into an array which is passed to the method, just as previously.
Any number of arguments may precede a last vararg argument. Here is a method that accepts a list and adds all the additional arguments to the end of the list:
public static <T> void addAll(List<T> list, T... arr) { for (T elt : arr) list.add(elt); }
Whenever a vararg is declared, one may either pass a list of arguments to be implicitly packed into an array, or explicitly pass the array directly. Thus, the preceding method may be invoked as follows:
List<Integer> ints = new ArrayList<Integer>(); Lists.addAll(ints, 1, 2); Lists.addAll(ints, new Integer[] { 3, 4 }); assert ints.toString().equals("[1, 2, 3, 4]");
We will see later that when we attempt to create an array containing a generic type, we will always receive an unchecked warning. Since varargs always create an array, they should be used only when the argument does not have a generic type (see Array Creation and Varargs).
In the preceding examples, the type parameter to the generic method is inferred, but it may also be given explicitly, as in the following examples:
List<Integer> ints = Lists.<Integer>
toList(); List<Object> objs = Lists.<Object>
toList(1, "two");
Explicit parameters are usually not required, but they are helpful
in the examples given here. In the first example, without the type
parameter there is too little information for the type inference algorithm
used by Sun’s compiler to infer the correct type. It infers that the
argument to toList
is an empty array of
an arbitrary generic type rather than an empty array of integers, and this
triggers the unchecked warning described earlier. (The Eclipse compiler
uses a different inference algorithm, and compiles the same line correctly
without the explicit parameter.) In the second example, without the type
parameter there is too much information for the type inference algorithm
to infer the correct type. You might think that Object
is the only type that an integer and a
string have in common, but in fact they also both implement the interfaces
Serializable
and Comparable
. The type inference algorithm cannot
choose which of these three is the correct type.
In general, the following rule of thumb suffices: in a call to a generic method, if there are one or more arguments that correspond to a type parameter and they all have the same type then the type parameter may be inferred; if there are no arguments that correspond to the type parameter or the arguments belong to different subtypes of the intended type then the type parameter must be given explicitly.
When a type parameter is passed to a generic method invocation, it appears in angle brackets to the left, just as in the method
declaration. The Java grammar requires that type parameters may appear
only in method invocations that use a dotted form. Even if the method
toList
is defined in the same class
that invokes the code, we cannot shorten it as follows:
List<Integer> ints = <Integer>toList(); // compile-time error
This is illegal because it will confuse the parser.
Methods Arrays.asList
and
Collections.addAll
in the Collections Framework are similar to toList
and addAll
shown earlier. (Both classes are in
package java.util
.) The Collections Framework version of asList
does not return an ArrayList
, but instead returns a specialized
list class that is backed by a given array. Also, its version of addAll
acts on general collections, not just
lists.
Assertions
We clarify our code by liberal use of the assert
statement. Each occurrence of assert
is followed by a boolean expression that
is expected to evaluate to true
. If
assertions are enabled and the expression evaluates to false
, an AssertionError
is thrown, including an
indication of where the error occurred. Assertions are enabled by invoking
the JVM with the -ea
or -enableassertions
flag.
We only write assertions that we expect to evaluate to true
. Since assertions may not be enabled, an
assertion should never have side effects upon which any nonassertion code
depends. When checking for a condition that might not hold (such as
confirming that the arguments to a method call are valid), we use a
conditional and throw an exception explicitly.
To sum up, we have seen how generics, boxing and unboxing, foreach loops, and varargs work together to make Java code easier to write, having illustrated this through the use of the Collections Framework.
Get Java Generics and Collections now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.