Chapter 1. Introduction

The purpose of generics is to allow the same code to be reused for creating or handling objects of different types. For example, List<String> and List<Integer> are different types, but they are implemented with the same code. Two kinds of Java code can be generic: types, such as the collection classes and interfaces; and methods, such as the static methods in the utility class java.util.Collections. Let’s look at each of these in turn.

Generic Types

An interface or class may be declared to take one or more type parameters, which are written in angle brackets and must be supplied when you declare a variable belonging to the interface or class or when you create a new instance of a class.

Here is an example:

List<String> words = new ArrayList<String>();
words.add("Hello ");
words.add("world!");
String s = words.get(0)+words.get(1);
assert s.equals("Hello world!");

In the Collections Framework, the class ArrayList<E> implements the interface List<E>. This trivial code fragment declares the variable words to contain a list of strings, creates an instance of an ArrayList, adds two strings to the list, and gets them out again.

In Java before generics, the same code would be written as follows:

List words = new ArrayList();
words.add("Hello ");
words.add("world!");
String s = ((String)words.get(0))+((String)words.get(1));
assert s.equals("Hello world!");

Without generics, the type parameters are omitted, but you must explicitly cast whenever an element is extracted from the list.

In fact, the bytecode compiled from the two sources above will be identical. We say that generics are implemented by erasure because the types List<Integer>, List<String>, and List<List<String>> are all represented at run-time by the same type, List. We also use erasure to describe the process that converts the first program to the second. The term erasure is a slight misnomer, since the process erases type parameters but adds casts.

Generics implicitly perform the same cast that is explicitly performed without generics. If such casts could fail, it might be hard to debug code written with generics. This is why it is reassuring that generics come with the following guarantee:

  • Cast-iron guarantee: the implicit casts added by the compilation of generics never fail.

There is also some fine print on this guarantee: it applies only when no unchecked warnings have been issued by the compiler. Later, we will discuss at some length what causes unchecked warnings to be issued and how to minimize their effect.

Implementing generics by erasure had a number of important effects. It kept things simple, in that generics did not add anything fundamentally new. It kept things small, in that there is exactly one implementation of List, not one version for each type. And it eased evolution, since the same library can be accessed in both nongeneric and generic forms.

This last point is worth some elaboration. It meant that you could never get unpleasant problems due to maintaining two versions of the libraries: a legacy version that worked with pre-generic Java, and a generic version that worked with generic Java. At the bytecode level, code that doesn’t use generics looks just like code that does. This meant that there was never any need to switch to generics all at once—you could evolve your code by updating just one package, class, or method at a time to start using generics. In [Link to Come] we even explain how you could declare generic types for legacy code. (Of course, the cast-iron guarantee mentioned above holds only if you add generic types that match the legacy code.)

Another consequence of implementing generics by erasure is that array types differ in important ways from parameterized types. Executing

new String[size]

allocates an array, and stores in that array an indication that its components are of type String. In contrast, executing:

new ArrayList<String>()

allocates a list, but does not store in the list any indication of the type of its elements. We say that Java reifies array component types but does not reify list element types (or other generic types). During the introduction of generics, this design was crucial in easing evolution (see [Link to Come]) and therefore in the continued popularity of Java. Years later, on the other hand, it continues to complicate casts, instance tests, and array creation (see Chapter 5).

Generics Versus Templates

Generics in Java resemble templates in C++. There are just two important things to bear in mind about the relationship between Java generics and C++ templates: syntax and semantics. The syntax is deliberately similar and the semantics are deliberately different.

Syntactically, angle brackets were chosen because they are familiar to C++ users, and because square brackets would be hard to parse. Semantically, Java generics are defined by erasure, whereas C++ templates are defined by expansion. In C++ templates, each instance of a template at a new type is compiled separately. If you use a list of integers, a list of strings, and a list of lists of string, there will be three versions of the code. If you use lists of a hundred different types, there will be a hundred versions of the code—a problem known as code bloat. In Java, no matter how many types of lists you use, there is always one version of the code, so bloat does not occur.

Expansion may lead to more efficient implementation than erasure, since it offers more opportunities for optimization, particularly for primitive types such as int. For code that is manipulating large amounts of data—for instance, large arrays in scientific computing—this difference may be significant. However, in practice, for most purposes the difference in efficiency is not important, whereas the problems caused by code bloat can be crucial.

In C++ you also may instantiate a template with a constant value rather than a type, making it possible to use templates as a sort of “macroprocessor on steroids” that can perform arbitrarily complex computations at compile time. Java generics are deliberately restricted to types, to keep them simple and easy to understand.

Generic Methods and Varargs

The preceding section described how interfaces and classes can accept a type argument. Individual methods can also be generic. Here is a method that accepts an array of any type and converts it to a list:

class Lists {
  public static <T> List<T> toList(T[] arr) {
    List<T> list = new ArrayList<T>();
    for (T elt : arr) list.add(elt);
    return list;
  }
}

The static method toList accepts an array of type T[] and returns a list of type List<T>, and does so for any type T. This is indicated by writing <T> at the beginning of the method declaration, which declares T as a new type variable. A method which declares a type variable in this way is called a generic method. The scope of the type variable T is local to the method itself; it may appear in the method declaration, but not outside the method.

The method may be invoked as follows:

List<Integer> ints = Lists.toList(new Integer[] {1, 2, 3});
List<String> words = Lists.toList(new String[] { "hello", "world" });

In the first line, boxing converts the int values 1, 2, 3 to Integers.

Packing the arguments into an array is cumbersome. Variable arity parameters, usually called varargs, permit a special, more convenient, syntax for the case in which the last argument of a method is an array. To use this feature, we replace T[] with T… in the method declaration:

class Lists {
  public static <T> List<T> toList(T... arr) {
    List<T> list = new ArrayList<T>();
    for (T elt : arr) list.add(elt);
    return list;
  }
}

(The declaration of this method differs only in its name from java.util.List::of.) Now the method may be invoked as follows:

List<Integer> ints = Lists.toList(1, 2, 3);
List<String> words = Lists.toList("hello", "world");

This is just shorthand for what we wrote above. At run time, the arguments are packed into an array which is passed to the method, just as previously.

Any number of arguments may precede the final varargs argument. Here is a method that accepts a list and adds all the additional arguments to the end of the list:

public static <T> void addAll(List<T> list, T... arr) {
  for (T elt : arr) list.add(elt);
}

(The declaration of this method differs only slightly from the declaration of java.util.Collections::addAll.) In calling a method with a varargs parameter, you can either pass a list of arguments to be implicitly packed into an array, or explicitly pass the array directly. Thus, addAll may be invoked as follows:

List<Integer> ints = new ArrayList<Integer>();
Lists.addAll(ints, 1, 2);
Lists.addAll(ints, new Integer[] { 3, 4 });
assert ints.equals(List.of(1, 2, 3, 4));

We will see later that when we attempt to create an array containing a generic type, we will always receive an unchecked warning. Since varargs always create an array, they should be used with care when the argument has a generic type (see “Array Creation and Varargs”).

The type parameter to the generic method is inferred in these examples, which correspond to the usual situation in which one or more arguments corresponding to a type parameter all have the same type. When there are no arguments, or the arguments are of different subtypes of the intended type, the type parameter may need to be supplied explicitly. For example:

var ints = Lists.<Integer>toList();
var objs = Lists.<Object>toList(1, "two");

In the first example, without the explicit type parameter the type inferred would be Object. In the second example, the type inferred would not only inherit from Object but would also implement all the interfaces that both Integer and String implement, including (but not only) Serializable and Comparable.

When a type parameter is passed to a generic method invocation, it appears in angle brackets to the left, just as in the method declaration. Java grammar requires that type parameters may appear only in method invocations that use a dotted form. Even if the method toList is defined in the same class that invokes the code, we cannot shorten it as follows:

List<Integer> ints = <Integer>toList(); // compile-time error

Primitive and Reference Types

The last topic to consider in this introductory chapter is that of primitive versus reference types. A reference type is any class, interface, or array type, whereas a primitive types is one of the eight listed below. The distinction is fundamental to Java’s implementation of generics, in which only reference types can be used as type parameters: primitive types are disallowed. So instead of writing List<int>, you must write List<Integer> instead1. All reference types are subtypes of class Object, and any variable of reference type may be set to the value null. As shown in the following table, there are eight primitive types, and each of these has a corresponding library class of reference type, located in the package java.lang.

Primitive Reference

byte

Byte

short

Short

int

Integer

long

Long

float

Float

double

Double

boolean

Boolean

char

Character

Conversion of a primitive type to the corresponding reference type is called boxing and conversion of the reference type to the corresponding primitive type is called unboxing.

Boxing and unboxing conversions are applied automatically where appropriate. If an expression e of type int appears where a value of type Integer is expected, boxing converts it to Integer.valueOf(e). If an expression e of type Integer appears where a value of type int is expected, unboxing converts it to the expression e.intValue(). For example, the sequence:

List<Integer> ints = new ArrayList<Integer>();
ints.add(1);
int n = ints.get(0);

produces bytecode equivalent to the sequence produced by:

List<Integer> ints = new ArrayList<Integer>();
ints.add(Integer.valueOf(1));
int n = ints.get(0).intValue();

The call Integer.valueOf(1) returns an Integer instance representing the int value 1. The factory method Integer.valueOf is preferred to the constructor Integer::new, which was deprecated at Java 11 and marked for removal in Java 16, because it allows for the possibility of reusing cached Integer objects2. In fact, the Java Language Specification requires that any two boxing conversions of the same int or short value between -128 and 127 inclusive (or a char value between '\u0000' and '\u007f', or a byte, or a boolean) should return the same reference (§5.1.7). So this assertion will always succeed:

assert Integer.valueOf(5) == Integer.valueOf(5)

whereas this one will usually succeed—but might not, depending on the caching policy of the JVM:

assert Integer.valueOf(500) != Integer.valueOf(500)

Here, again, is the code to find the sum of a list of integers, conveniently packaged as a static method:

public static int sum (List<Integer> ints) {
  int s = 0;
  for (int n : ints) { s += n; }
  return s;
}

Why does the argument have type List<Integer> and not List<int>? Because type parameters must always be bound to reference types, not primitive types. Why has the method been defined to return a value of type int and not Integer? Because result types may be either primitive or reference types, and it is more efficient to use the former than the latter. Unboxing occurs when each Integer in the list ints is bound to the variable n of type int. We could rewrite the method, replacing each occurrence of int with Integer:

public static Integer sumInteger(List<Integer> ints) {
  Integer s = 0;
  for (Integer n : ints) { s += n; }
  return s;
}

This code compiles and runs correctly but performs a lot of needless work. Each iteration of the loop unboxes the values in s and n, performs the addition, and boxes the result again.

1 This difference may disappear in a future version of Java as part of Project Valhalla ([Valhalla]), but much preparatory work is needed before this can happen.

2 Confusingly, perhaps, this deprecation may eventually be reversed, also as part of Project Valhalla.

Get Java Generics and Collections, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.