As
we saw in the last section, objects are expensive to create. Where it
is reasonable to reuse the same object, you should do so. You need to
be aware of when not to call new
. One fairly
obvious situation is when you have already used an object and can
discard it before you are about to create another object of the same
class. You should look at the object and consider whether it is
possible to reset the fields and then reuse the object, rather than
throw it away and create another. This can be particularly important
for objects that are constantly used and discarded: for example, in
graphics processing, objects such as
Rectangle
s, Point
s,
Color
s, and Font
s are used and
discarded all the time. Recycling these types of objects can
certainly improve performance.
Recycling can also apply to the internal elements of structures. For example, a linked list has nodes added to it as it grows, and as it shrinks, the nodes are discarded. Holding on to the discarded nodes is an obvious way to recycle these objects and reduce the cost of object creation.
Most container objects (e.g.,
Vector
s, Hashtable
s) can be
reused rather than created and thrown away. Of course, while you are
not using the retained objects, you are holding on to more memory
than if you simply discarded those objects, and this reduces the
memory available to create other objects. You need to balance the
need to have some free memory available against the need to improve
performance by reusing objects. But generally, the space taken by
retaining objects for later reuse is significant only for very large
collections, and you should certainly know which ones these are in
your application.
Note that when recycling container objects, you need to dereference all the elements previously in the container so that you don’t prevent them from being garbage collected. Because there is this extra overhead in recycling, it may not always be worth recycling containers. As usual for tuning, this technique is best applied to ameliorate an object-creation bottleneck that has already been identified.
A good strategy for reusing container objects is to use your own container classes, possibly wrapping other containers. This gives you a high degree of control over each collection object, and you can design these specifically for reuse. You can still use a pool manager to manage your requirements, even without reuse-designed classes. Reusing classes requires extra work when you’ve finished with a collection object, but the effort is worth it when reuse is possible. The code fragment here shows how you could use a vector pool manager:
//An instance of the vector pool manager. public static VectorPoolManager vectorPoolManager = new VectorPoolManager(25); ... public void someMethod( ) { //Get a new Vector. We only use the vector to do some stuff //within this method, and then we dump the vector (i.e. it //is not returned or assigned to a state variable) //so this is a perfect candidate for reusing Vectors. //Use a factory method instead of 'new Vector( )'Vector v = vectorPoolManager.getVector( );
... //do vector manipulation stuff //and the extra work is that we have to explicitly tell the //pool manager that we have finished with the vectorvectorPoolManager.returnVector(v);
}
Note that nothing stops the application from retaining a handle on a
vector after it has been returned to the pool, and obviously that
could lead to a classic “inadvertent reuse of memory”
bug.
You need to ensure that handles to vectors are not held anywhere:
these Vector
s should be used only internally
within an application, not externally in third-party classes where a
handle may be retained. The following class manages a pool of
Vector
s:
package tuning.reuse; import java.util.Vector; public class VectorPoolManager { Vector[] pool; boolean[] inUse; public VectorPoolManager(int initialPoolSize) { pool = new Vector[initialPoolSize]; inUse = new boolean[initialPoolSize]; for (int i = pool.length-1; i>=0; i--) { pool[i] = new Vector( ); inUse[i] = false; } } public synchronized Vector getVector( ) { for (int i = inUse.length-1; i >= 0; i--) if (!inUse[i]) { inUse[i] = true; return pool[i]; } //If we got here, then all the Vectors are in use. We will //increase the number in our pool by 10 (arbitrary value for //illustration purposes). boolean[] old_inUse = inUse; inUse = new boolean[old_inUse.length+10]; System.arraycopy(old_inUse, 0, inUse, 0, old_inUse.length); Vector[] old_pool = pool; pool = new Vector[old_pool.length+10]; System.arraycopy(old_pool, 0, pool, 0, old_pool.length); for (int i = old_pool.length; i < pool.length; i++) { pool[i] = new Vector( ); inUse[i] = false; } //and allocate the last Vector inUse[pool.length-1] = true; return pool[pool.length-1]; } public synchronized void returnVector(Vector v) { for (int i = inUse.length-1; i >= 0; i--) if (pool[i] == v) { inUse[i] = false; //Can use clear( ) for java.util.Collection objects //Note that setSize( ) nulls out all elements v.setSize(0); return; } throw new RuntimeException("Vector was not obtained from the pool: " + v); } }
Because you reset the Vector
size to
0
when it is returned to the pool, all objects
previously referenced from the vector are no longer referenced (the
Vector.setSize( )
method null
s out
all internal index entries beyond the new size to ensure no reference
is retained). However, at the same time, you do not return any memory
allocated to the Vector
itself, because the
Vector
’s current capacity is retained. A
lazily initialized version of this class simply starts with zero
items in the pool and sets the pool to grow by one or more each time.
(Many JDK collection classes, including
java.util.Vector
, have both a size and a capacity.
The capacity is the number of elements the collection can hold before
that collection needs to resize its internal memory to be larger. The
size is the number of externally accessible elements the collection
is actually holding. The capacity is always greater than or equal to
the size. By holding spare capacity, elements can be added to
collections without having to continually resize the underlying
memory. This makes element addition faster and more
efficient.)
The previous example of a pool manager can
be used by multiple threads in a multithreaded application, although
the getVector( )
and returnVector()
methods first need to be defined as
synchronized
.
This may be all you need to ensure that you reuse a set of objects in
a multithreaded application. Sometimes though, there are objects you
need to use in a more complicated way. It may be that the objects are
used in such a way that you can guarantee you need only one object
per thread, but any one thread must consistently use the same object.
Singletons (see Section 4.2.4) that maintain some
state information are a prime example of this sort of object.
In this case, you might want to use a ThreadLocal
object. ThreadLocal
s have accessors that return an
object local to the current thread. ThreadLocal
use is best illustrated using an example; this one produces:
[This is thread 0, This is thread 0, This is thread 0] [This is thread 1, This is thread 1, This is thread 1] [This is thread 2, This is thread 2, This is thread 2] [This is thread 3, This is thread 3, This is thread 3] [This is thread 4, This is thread 4, This is thread 4]
Each thread uses the same access method to obtain a vector to add
some elements. The vector obtained by each thread is always the same
vector for that thread: the ThreadLocal
object
always returns the thread-specific vector. As the following code
shows, each vector has the same string added to it repeatedly,
showing that it is always obtaining the same thread-specific vector
from the vector access method. (Note that
ThreadLocal
s are only available from Java 2, but
it is easy to create the equivalent functionality using a
Hashtable
: see the getVectorPriorToJDK12()
method.)
package tuning.reuse; import java.util.*; public class ThreadedAccess implements Runnable { static int ThreadCount = 0; public void run( ) { //simple test just accesses the thread local vector, adds the //thread specific string to it, and sleeps for two seconds before //again accessing the thread local and printing out the value. String s = "This is thread " + ThreadCount++; Vector v = getVector( ); v.addElement(s); v = getVector( ); v.addElement(s); try{Thread.sleep(2000);}catch(Exception e){} v = getVector( ); v.addElement(s); System.out.println(v); } public static void main(String[] args) { try { //Four threads to see the multithreaded nature at work for (int i = 0; i < 5; i++) { (new Thread(new ThreadedAccess( ))).start( ); try{Thread.sleep(200);}catch(Exception e){} } } catch(Exception e){e.printStackTrace( );} } private static ThreadLocal vectors = new ThreadLocal( ); public static Vector getVector( ) { //Lazily initialized version. Get the thread local object Vector v = (Vector) vectors.get( ); if (v == null) { //First time. So create a vector and set the ThreadLocal v = new Vector( ); vectors.set(v); } return v; } private static Hashtable hvectors = new Hashtable( ); /* This method is equivalent to the getVector( ) method, * but works prior to JDK 1.2 (as well as after). */ public static Vector getVectorPriorToJDK12( ) { //Lazily initialized version. Get the thread local object Vector v = (Vector) hvectors.get(Thread.currentThread( )); if (v == null) { //First time. So create a vector and set the thread local v = new Vector( ); hvectors.put(Thread.currentThread( ), v); } return v; } }
Reuse also
applies
when a constant object is returned
for information. For example, the preferredSize( )
of a customized widget returns a Dimension
object
that is normally one particular dimension. But to ensure that the
stored unchanging Dimension
value does not get
altered, you need to return a copy of the stored
Dimension
. Otherwise, the calling method accesses
the original Dimension
object and can change the
Dimension
values, thus affecting the original
Dimension
object itself.
Java provides a final
modifier to fields that
allows you to provide fixed values for the
Dimension
fields. Unfortunately, you cannot redefine an already existing class,
so Dimension
cannot be redefined to have
final
fields. The best solution in this case is
that a separate class, FixedDimension
, be defined
with final
fields (this cannot be a subclass of
Dimension
, as the fields can’t be redefined
in the subclass). This extra class allows methods to return the same
FixedDimension
object if applicable, or a new
FixedDimension
is returned (as happens with
Dimension
) if the method requires different values
to be returned for different states. Of course, it is too late now
for java.awt
to be changed in this way, but the
principle
remains.
Note that making a field final
does not make an
object unchangeable. It only disallows changes to the field:
public class FixedDimension { final int height; final int width; ... } //Both the following fields are defined as final public static final Dimension dim = new Dimension(3,4); public static final FixedDimension fixedDim = new FixedDimension(3,4); dim.width = 5; //reassignment allowed dim = new Dimension(3,5);//reassignment disallowed fixedDim.width = 5; //reassignment disallowed fixedDim = new FixedDimension(3,5); //reassignment disallowed
An
alternative to defining preferredSize()
to return a fixed object is to
provide a method that accepts an object whose values will be set,
e.g., preferredSize(Dimension)
. The caller can
then pass in a Dimension
object, which would have
its values filled in by the
preferredSize(Dimension)
method. The calling method
can then access the values in the Dimension
object. This same Dimension
object can be reused
for multiple components. This design pattern is beginning to be used
extensively within the JDK. Many methods developed with JDK 1.2 and
onward accept a parameter that is filled in, rather than returning a
copy of the master value of some object. If necessary, backward
compatibility can be retained by adding this method as extra, rather
than replacing an existing method:
public static final Dimension someSize = new Dimension(10,5); //original definition returns a new Dimension. public Dimension someSize( ) { Dimension dim = new Dimension(0,0); someSize(dim); return dim; } //New method which fills in the Dimension details in a passed parameter. public void someSize(Dimension dim) { dim.width = someSize.width; dim.width = someSize.height; }
Wherever
possible, you should replace multiple objects with a single object
(or just a few). For example, if you need only one
VectorPoolManager
object, it makes sense to
provide a static variable somewhere that holds this. You can even
enforce this by making the
constructor private and holding the singleton
in the class itself; e.g., change the definition of
VectorPoolManager
to:
public class VectorPoolManager { public static final VectorPoolManager SINGLETON = new VectorPoolManager(10); Vector[] pool; boolean[] inUse; //Make the constructor private to enforce that //no other objects can be created. private VectorPoolManager(int initialPoolSize) { ... }
An alternative implementation is to make everything
static (all methods and both the
instance variables in the VectorPoolManager
class). This also ensures that only one pool manager can be used. My
preference is to have a
SINGLETON
object for design reasons.[29]
This activity of replacing multiple copies of an object with just a
few objects is often referred to as
canonicalizing objects. The
Boolean
s provide an existing example of objects
that should have been canonicalized in the JDK. They were not, and no
longer can be without breaking backward compatibility. For
Boolean
s, only two objects need to exist, but by
allowing a new Boolean
object to be created (by
providing public constructors), you lose canonicalization. The JDK
should have enforced the existence of only two objects by keeping the
constructors private. Note that canonical objects have another
advantage in addition to reducing the number of objects created: they
also allow comparison by
identity
. For example:
Boolean t1 = new Boolean(true); System.out.println(t1==Boolean.TRUE); System.out.println(t1.equals(Boolean.TRUE));
produces the output:
false true
If Boolean
s had been canonicalized, all
Boolean
comparisons could be done by identity:
comparison by identity is always faster than comparison by equality,
because identity comparisons are simply pointer
comparisons.[30]
You are probably better off not canonicalizing all objects that could
be canonicalized. For example, the
Integer
class can (theoretically) have its
instances canonicalized, but you need a map of some sort, and it is
more efficient to allow multiple instances, rather than to manage a
potential pool of four billion objects. However, the situation is
different for particular applications. If you use just a few
Integer
objects in some defined way, you may find
you are repeatedly creating the Integer
objects
with values 1, 2, 3, etc., and also have to access the
integerValue( )
to compare them. In this case, you
can canonicalize a few integer objects, improving performance in
several ways: eliminating the extra Integer
creations and the garbage collections of these objects when they are
discarded, and allowing comparison by identity. For example:
public class IntegerManager { public static final Integer ZERO = new Integer(0); public static final Integer ONE = new Integer(1); public static final Integer TWO = new Integer(2); public static final Integer THREE = new Integer(3); public static final Integer FOUR = new Integer(4); public static final Integer FIVE = new Integer(5); public static final Integer SIX = new Integer(6); public static final Integer SEVEN = new Integer(7); public static final Integer EIGHT = new Integer(8); public static final Integer NINE = new Integer(9); public static final Integer TEN = new Integer(10); } public class SomeClass { public void doSomething(Integer i) { //Assume that we are passed a canonicalized Integer if (i == IntegerManager.ONE) xxx( ); else if(i == IntegerManager.FIVE) yyy( ); else ... } ... }
There are various other frequently used objects throughout an application that should be canonicalized. A few that spring to mind are the empty string, empty arrays of various types, and some dates.
There
can be some confusion about whether String
s are
already canonicalized. There is no guarantee that they are, although
the
compiler can canonicalize
String
s that are equal and are compiled in the
same pass. The String.intern()
method canonicalizes strings in an
internal table. This is supposed to be, and usually is, the same
table used by strings canonicalized at compile time, but in some
earlier JDK versions (e.g., 1.0), it was not the same table. In any
case, there is no particular reason to use the internal string table
to canonicalize your strings unless you want to compare
String
s by
identity (see Section 5.5). Using your own table gives you more
control and allows you to inspect the table when necessary. To see
the difference between identity and equality comparisons for
String
s, including the difference that
String.intern( )
makes, you can run the following
class:
public class Test { public static void main(String[] args) { System.out.println(args[0]); //see that we have the empty string //should be true System.out.println(args[0].equals("")); //should be false since they are not identical objects System.out.println(args[0] == ""); //should be true unless there are two internal string tables System.out.println(args[0].intern( ) == ""); } }
This Test
class, when run with the command line:
java Test ""
gives the output:
true false true
Canonicalizing objects is best for read-only
objects
and can be troublesome for objects that change. If you canonicalize a
changeable object and then change its state, then all objects that
have a reference to the canonicalized object are still pointing to
that object, but with the object’s new state. For example,
suppose you canonicalize a special Date
value. If
that object has its date value changed, all objects pointing to that
Date
object now see a different date value. This
result may be desired, but more often it is a bug.
If you want to canonicalize changeable objects, one technique to make it slightly safer is to wrap the object with another one, or use your own (sub)class.[31] Then all accesses and updates are controlled by you. If the object is not supposed to be changed, you can throw an exception on any update method. Alternatively, if you want some objects to be canonicalized but with copy-on-write behavior, you can allow the updater to return a noncanonicalized copy of the canonical object.
Note that it makes no sense to build a table of millions or even thousands of strings (or other objects) if the time taken to test for, access, and update objects in the table is longer than the time you are saving canonicalizing them.
One technique for maintaining collections of objects that can grow
too large is the use of
WeakReference
s (from the
java.lang.ref
package in Java 2). If you need to
maintain one or more pools of objects with a large number of objects
being held, you may start coming up against memory limits of the VM.
In this case, you should consider using
WeakReference
objects to hold on to your pool
elements. Objects referred to by WeakReference
s
can be automatically garbage-collected if memory gets low enough (see
Reference Objects).
A WeakReference
normally maintains references to
elements in a table of canonicalized objects. If memory gets low, any
of the objects referred to by the table and not referred to anywhere
else in the application (except by other weak references) are
garbage-collected
.
This does not affect the canonicalization because only those objects
not referenced anywhere else are removed. The canonical object can be
re-created when required, and this new instance is now the new
canonical object: remember that no other references to the object
exist, or the original could not have been garbage-collected.
For example, a table of canonical Integer
objects
can be maintained using WeakReference
s. This
example is not particularly useful: unlike the earlier example, in
which Integer
objects from 1 to 10 can be
referenced directly with no overhead, thus providing a definite
speedup for tests, the next example has overheads that would probably
swamp any benefits of having canonical Integer
s. I
present it only as a clear and simple example to illustrate the use
of WeakReference
s.
The example has two iterations: one sets an array of canonical
Integer
objects up to a value set by the
command-line argument; a second loops through to access the first 10
canonical Integer
s. If the first loop is large
enough (or the VM memory is constrained low enough), the garbage
collector kicks in and starts reclaiming some of the
Integer
objects that are all being held by
WeakReference
s. The second loop then reaccesses
the first 10 Integer
objects. Earlier, I
explicitly held on to five of these Integer
objects (integers 3 to 7 inclusive) in variables so that they could
not be garbage-collected, and so that the second loop would reset
only the five reclaimed Integer
s. When running
this test with the VM constrained to 4 MB:
java -Xmx4M tuning.reuse.Test 100000
you get the following output:
Resetting integer 0 Resetting integer 1 Resetting integer 2 Resetting integer 8 Resetting integer 9
The example is defined here. Note the overheads. Even if the reference has not been garbage-collected, you have to access the underlying object and cast it to the desired type:
package tuning.reuse; import java.util.*; import java.lang.ref.*; public class Test { public static void main(String[] args) { try { Integer ic = null; int REPEAT = args.length > 0 ? Integer.parseInt(args[0]) : 10000000; //Hang on to the Integer objects from 3 to 7 //so that they cannot be garbage collected Integer i3 = getCanonicalInteger(3); Integer i4 = getCanonicalInteger(4); Integer i5 = getCanonicalInteger(5); Integer i6 = getCanonicalInteger(6); Integer i7 = getCanonicalInteger(7); //Loop through getting canonical integers until there is not //enough space, and the garbage collector reclaims some. for (int i = 0; i < REPEAT; i++) ic = getCanonicalInteger(i); //Now just re-access the first 10 integers (0 to 9) and //the 0, 1, 2, 8, and 9 integers will need to be reset in //the access method since they will have been reclaimed for (int i = 0; i < 10; i++) ic = getCanonicalInteger(i); System.out.println(ic); } catch(Exception e){e.printStackTrace( );} } private static Vector canonicalIntegers = new Vector( ); public static Integer getCanonicalInteger(int i) { //First make sure our collection is big enough if (i >= canonicalIntegers.size( )) canonicalIntegers.setSize(i+1); //Now access the canonical value. //This element contains null if the the value has never been set //or a weak reference that may have been garbage collected WeakReference ref = (WeakReference) canonicalIntegers.elementAt(i); Integer canonical_i; if (ref == null) { //never been set, so create and set it now canonical_i = new Integer(i); canonicalIntegers.setElementAt(new WeakReference(canonical_i), i); } else if( (canonical_i = (Integer) ref.get( )) == null) { //has been set, but was garbage collected, so recreate and set it now //Include a print to see that we are resetting the Integer System.out.println("Resetting integer " + i); canonical_i = new Integer(i); canonicalIntegers.setElementAt(new WeakReference(canonical_i), i); } //else clause not needed, since the alternative is that the weak ref was //present and not garbage collected, so we now have our canonical integer return canonical_i; } }
Another canonicalization technique often used is replacing constant objects with integers. For example, rather than use the strings “female” and “male”, you should use a constant defined in an interface:
public interface GENDER { public static final int FEMALE=1; public static final int MALE=2; }
Used consistently, this enumeration can provide both speed and memory advantages. The enumeration requires less memory than the equivalent strings and makes network transfers faster. Comparisons are faster too, as the identity comparison can be used instead of the equality comparison. For example, you can use:
this.gender == FEMALE;
instead of:
this.gender.equals("female");
[29] The
VectorPoolManager
is really an object with
behavior and state. It is not just a related group of functions
(which is what class static methods are equivalent to). My colleague
Kirk Pepperdine insists that this choice is more than just a
preference. He states that holding on to an object as opposed to
using statics provides more flexibility should you need to alter the
use of the VectorPoolManager
or provide multiple
pools. I agree.
[30] Deserializing
Boolean
s would have required special handling to
return the canonical Boolean
. All canonicalized
objects similarly require special handling to manage serialization.
Java serialization supports the ability, when deserializing, to
return specific objects in place of the object that is normally
created by the default deserialization mechanism.
[31] Beware that using a subclass may break the superclass semantics.
Get Java Performance Tuning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.