Hi there, and welcome. This content is still relevant,
but fairly old. If you are interested in keeping up-to-date with similar
articles on profiling, performance testing, and writing performant
code, consider signing up to the Four Steps to Faster Software newsletter. Thanks!
I recently came across a nice example of Oracle's Hotspot JVM using escape analysis in order to perform stack allocation, rather than heap allocation.
I'm sure that in many codebases across the planet, the following code fragments are familiar territory:
When trying to write garbage-free, or low-garbage code (as we do at LMAX), it is necessary to think about any code that may allocate unnecessary objects. Every invocation of the dispatchEvent method in the code above will cause the creation of an Iterator object (since the for-loop construct is just syntactic sugar for List.iterator().hasNext()/next()).
Using a byte-code viewer (I use ASM Bytecode Outline) to inspect the dispatchEvent method shows the creation and use of an Iterator object (via List.iterator()):
L5
LINENUMBER 32 L5
ALOAD 1
INVOKEINTERFACE java/util/List.iterator ()Ljava/util/Iterator;
ASTORE 3
L6
FRAME APPEND [java/util/Iterator]
ALOAD 3
INVOKEINTERFACE java/util/Iterator.hasNext ()Z
IFEQ L7
ALOAD 3
INVOKEINTERFACE java/util/Iterator.next ()Ljava/lang/Object;
CHECKCAST epickrram/example/Listener
ASTORE 4
This would seem like an excellent candidate for escape analysis - if the compiler is smart enough, it should allocate the Iterator object on the stack, and save having to touch the heap.
This can be tested by running a simple test while printing out the GC activity:
The output with -XX:+DoEscapeAnalysis is:
Event Count: 10000000
The output with -XX:-DoEscapeAnalysis is:
0.113: [GC [PSYoungGen: 23552K->416K(27456K)] 23552K->416K(90176K), 0.0010690 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
Total time for which application threads were stopped: 0.0012140 seconds
...
0.451: [GC [PSYoungGen: 94576K->0K(94656K)] 94592K->336K(157376K), 0.0011190 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
Total time for which application threads were stopped: 0.0012220 seconds
Event Count: 10000000
So we can see that the JVM is very kindly using stack allocation for the Iterator objects when escape analysis is enabled (the default since JDK1.6).
In the particular example I was looking at, I noticed that we only ever added a single Listener instance, so the length of the listeners list was always one. In this case it is wasteful to construct the new Iterator instance, even if it is allocated on the stack. The list of Listener objects can be replaced by a single reference to a Listener:
This small optimisation makes an order-of-magnitude difference when compared using a caliper benchmark. The benchmark has two tests - Access, which calls a single listener instance, and Iteration, which iterates over a list of size one.
I'm sure that in many codebases across the planet, the following code fragments are familiar territory:
When trying to write garbage-free, or low-garbage code (as we do at LMAX), it is necessary to think about any code that may allocate unnecessary objects. Every invocation of the dispatchEvent method in the code above will cause the creation of an Iterator object (since the for-loop construct is just syntactic sugar for List.iterator().hasNext()/next()).
Using a byte-code viewer (I use ASM Bytecode Outline) to inspect the dispatchEvent method shows the creation and use of an Iterator object (via List.iterator()):
L5
LINENUMBER 32 L5
ALOAD 1
INVOKEINTERFACE java/util/List.iterator ()Ljava/util/Iterator;
ASTORE 3
L6
FRAME APPEND [java/util/Iterator]
ALOAD 3
INVOKEINTERFACE java/util/Iterator.hasNext ()Z
IFEQ L7
ALOAD 3
INVOKEINTERFACE java/util/Iterator.next ()Ljava/lang/Object;
CHECKCAST epickrram/example/Listener
ASTORE 4
This would seem like an excellent candidate for escape analysis - if the compiler is smart enough, it should allocate the Iterator object on the stack, and save having to touch the heap.
This can be tested by running a simple test while printing out the GC activity:
The output with -XX:+DoEscapeAnalysis is:
Event Count: 10000000
The output with -XX:-DoEscapeAnalysis is:
0.113: [GC [PSYoungGen: 23552K->416K(27456K)] 23552K->416K(90176K), 0.0010690 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
Total time for which application threads were stopped: 0.0012140 seconds
...
0.451: [GC [PSYoungGen: 94576K->0K(94656K)] 94592K->336K(157376K), 0.0011190 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
Total time for which application threads were stopped: 0.0012220 seconds
Event Count: 10000000
So we can see that the JVM is very kindly using stack allocation for the Iterator objects when escape analysis is enabled (the default since JDK1.6).
In the particular example I was looking at, I noticed that we only ever added a single Listener instance, so the length of the listeners list was always one. In this case it is wasteful to construct the new Iterator instance, even if it is allocated on the stack. The list of Listener objects can be replaced by a single reference to a Listener:
This small optimisation makes an order-of-magnitude difference when compared using a caliper benchmark. The benchmark has two tests - Access, which calls a single listener instance, and Iteration, which iterates over a list of size one.