Hi there, and welcome. This content is still relevant,
but fairly old. If you are interested in keeping up-to-date with similar
articles on profiling, performance testing, and writing performant
code, consider signing up to the Four Steps to Faster Software newsletter. Thanks!
This is an update to my last post exploring behaviour in the Oracle/OpenJDK JVM that forces a periodic safepoint under normal conditions.
After that post was published, Gil Tene from Azul Systems asked how their Zing JVM matched up against Oracle/OpenJDK in terms of runtime jitter.
At LMAX Exchange, we have been using the Zing JVM for several years, after we found that it removed a large proportion of our latency outliers. Namely those caused by garbage collection pauses.
After that post was published, Gil Tene from Azul Systems asked how their Zing JVM matched up against Oracle/OpenJDK in terms of runtime jitter.
At LMAX Exchange, we have been using the Zing JVM for several years, after we found that it removed a large proportion of our latency outliers. Namely those caused by garbage collection pauses.
Measuring runtime jitter
These measurements were initially made to try to demonstrate jitter introduced to an application by the Linux kernel's scheduler. Since the kernel is responsible for allocating CPU time to runnable processes, these decisions can show up as sources of latency in well-tuned applications.
In this application, a 'producer' thread makes a call to System.nanoTime() (which under the hood uses the monotonic clock on Linux) and passes the result into an instance of the Disruptor. On the consuming side, an 'accumulator' thread also calls System.nanoTime(), then records the delta (in nanoseconds) into an HdrHistogram.
Using this method, we can explore the time taken to pass a message between two threads. In the majority of cases, this will be very quick, but there will be outliers introduced by the runtime (i.e. JVM) and the operating system (i.e. Linux scheduler).
During developing this application, I came across the 100-microsecond jitter introduced by the Oracle/OpenJDK's forced safepoint behaviour. This is discussed in more detail in my previous post.
So how do these two JVMs fare against each other?
Comparing JVMs
In the results below, the effect of the forced safepoints are clear, giving a maximum jitter of around 100 microseconds:
Oracle/OpenJDK with forced safepoints enabled (default):
== Accumulator Message Transit Latency (ns) ==
mean 269
min 168
50.00% 216
90.00% 464
99.00% 608
99.90% 736
99.99% 960
99.999% 4352
99.9999% 15872
max 106496
count 3595101
Disabling the forced safepoints removes the outliers:
Oracle/OpenJDK with forced safepoints disabled:
== Accumulator Message Transit Latency (ns) ==
mean 385
min 152
50.00% 352
90.00% 464
99.00% 640
99.90% 768
99.99% 864
99.999% 3072
99.9999% 17408
max 20480
count 3595101
Comparing this to Zing's default behaviour:
Zing:
== Accumulator Message Transit Latency (ns) ==
mean 263
min 136
50.00% 256
90.00% 288
99.00% 448
99.90% 512
99.99% 608
99.999% 3200
99.9999% 10240
max 13312
count 3595101
For those who like a visual representation, here's the comparison in chart form:
And in log-scale:
Conclusion
Zing doesn't need to force periodic safepoints during normal operation, so assuming that you don't have any other sources of jitter in your program, you'll get a flatter latency profile with out-of-the-box behaviour.
It is possible to restrict the forced safepoint behaviour of the Oracle/OpenJDK, but the consequences of doing so are unclear. Dragons may well be involved.
At LMAX Exchange, we currently run our microbenchmarks on Oracle JDK, and we do suppress the periodic forced safepoint behaviour in order to reduce jitter in the results. So far, there have been no adverse effects, but our microbenchmarks only run for short periods of time.
About the tests
OracleJDK version:
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
Zing version:
java version "1.8.0-zing_15.05.0.0"
Zing Runtime Environment for Java Applications (build 1.8.0-zing_15.05.0.0-b8)
Zing 64-Bit Tiered VM (build 1.8.0-zing_15.05.0.0-b16-product-azlinuxM-X86_64, mixed mode)
OracleJDK flags:
Baseline run:
-XX:+DisableExplicitGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:-UseBiasedLocking -Xmx4g -Xms4g
Disabled forced safepoints run:
-XX:+UnlockDiagnosticVMOptions -XX:GuaranteedSafepointInterval=600000 -XX:+DisableExplicitGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:-UseBiasedLocking -Xmx4g -Xms4g
Zing JVM flags:
-XX:-UseMetaTicks -XX:-UseTickProfiler -XX:GenPauselessNewThreads=2 -XX:GenPauselessOldThreads=2 -XX:+ConcurrentDeflation -XX:+UseRdtsc
These tests were run on a highly-tuned Linux system, utilising such marvellous techniques as CPU isolation, thread affinity, cache-friendly location, and other magic fairy dust.
An upcoming blog post will go into more detail on how to get OS scheduler jitter down to the low tens-of-microseconds.
Follow @epickrram
Follow @epickrram