Friday, 14 September 2018

Advanced Java Profiling Training Course

I'm running a public training course on Advanced Java Profiling on 17th October 2018 in London, UK.

The agenda for the day is:

Profilers - the good, the bad, the ugly?
 A look at the profiler landscape comparing the pros and cons of each type.  

Sampling vs recording
Trade-offs that need to be considered when choosing a profilng method.  

Sampling bias
How to avoid common pitfalls in measurements.

Flamegraphs and other visualisations

Allocation profiling

Lunch break

Custom tracepoints 
How to instrument programs to capture events.

Profiling in production
Techniques for "always-on" profilng.

OS/Kernel/Hardware profiling

Looking for performance issues lower down the stack.

Tickets can be bought here.

Monday, 18 September 2017

Heap Allocation Flamegraphs

The most recent addition to the grav collection of performance visualisation tools is a utility for tracking heap allocations on the JVM.

This is another Flamegraph-based visualisation that can be used to determine hotspots of garbage creation in a running program.

Usage and mode of operation

Detailed instructions on installation and pre-requisites can be found in the grav repository on github.

Heap allocation flamegraphs use the built-in user statically-defined tracepoints (USDTs), which have been added to recent versions of OpenJDK and Oracle JDK.

To enable the probes, the following command-line flags are required:

-XX:+DTraceAllocProbes -XX:+ExtendedDTraceProbes

Once the JVM is running, the heap-alloc-flames script can be used to generate a heap allocation flamegraph:

$ ./bin/heap-alloc-flames -p $PID -e "java/lang/String" -d 10
Wrote allocation-flamegraph-$PID.svg

BE WARNED: this is a fairly heavyweight profiling method - on each allocation, the entire stack-frame is walked and hashed in order to increment a counter. The JVM will also use a slow-path for allocations when extended DTrace probes are enabled.

It is possible to limit the profiling to record every N samples with the '-s' parameter (see the documentation for more info).

For a more lightweight method of heap-profiling, see the excellent async-profiler, which uses a callback on the first TLAB allocation to perform its sampling.

When developing low-garbage or garbage-free applications, it is useful to be able to instrument every allocation, at least within a performance-test environment. This tool could even be used to regression-test allocation rates for low-latency applications, to ensure that changes to the codebase are not increasing allocations.


The allocation profiler works by attaching a uprobe to the dtrace_object_alloc function provided by the JVM.

When the profiler is running, we can confirm that the tracepoint is in place by looking at /sys/kernel/debug/tracing/uprobe_events:

$ cat  /sys/kernel/debug/tracing/uprobe_events

Given that we know the type signature of the dtrace_object_alloc method, it is a simple matter to extract the class-name of the object that has just been allocated.

As the profiler is running, it is recording a count against a compound key of java class-name and stack-trace id. At the end of the sampling period, the count is used to 'inflate' the occurrences of a given stack-trace, and these stack-traces are then piped through the usual flamegraph machinery.

Controlling output

Allocation flamegraph

Stack frames can be included or excluded from the generated Flamegraph by using regular expressions that are passed to the python program.

For example, to exclude all allocations of java.lang.String and java.lang.Object[], add the following parameters:

-e "java/lang/String" "java.lang.Object\[\]"

To include only allocations of java.util.ArrayList, add the following:

-i "java/util/ArrayList"

Inspiration & Thanks

Thanks to Amir Langer for collaborating on this profiler.

For more information on USDT probes in the JVM, see Sasha's blog posts.