Wednesday, 22 January 2014

The cost of Exception construction

In my job at LMAX Exchange, I often come across performance problems that have non-obvious causes, or that do not readily show up in an execution profiler. This post describes a one-line change that had a significant effect on one of our key business metrics.

As a Multilateral Trading Facility, latency is very important to us, and our users. Our market-makers need to be able to price our markets with high frequency and react to market events quickly; this helps them to keep their spreads tight, providing our customers with a better experience. We also care about our customers' latency - they need to be able to place (or cancel) an order as soon as they see the market move into a favourable position. In this post, I am focussing on the latency experienced by our market-makers.

First, a bit of background about our system. This particular issue had been in the codebase unnoticed for several years, since its overall effect on performance was so low. The component in question is a gateway that is responsible for receiving market-maker order flow in the form of FIX messages via TCP and translating these messages into our internal protocol for transmission to the matching engine.

The problem in the code meant that we were throwing an exception for almost every received message. This might sound like a huge amount of overhead that would easily show up in a profiler, but we did not see the issue in our last round of profiling due to there being other low-hanging fruit to optimise. While working with another tech company, which makes a JVM, we stumbled across this problem during the process of tracking down an unrelated performance issue.

The problem

The FIX protocol allows a sender to specify whether a message is a possible duplicate of a previously sent message, functionality that is used when negotiating session catch-up after a lost connection. As a Multilateral Trading Facility, we explicitly ignore any received messages that are marked as possible duplicates (we provide an alternative method for reconciliation).

In order to ignore these messages, we were inspecting every incoming order placement message and bailing out early if the possible-duplicate flag was set. We are currently using the QuickFix/J library for FIX message processing, which provides a map-based abstraction of the tag-based FIX message protocol.

The QuickFix/J message API attempts to deal with the prospect of optional fields by throwing FieldNotFound (an Exception subclass) if the specified field is not set. Our code was using this feature in order to do flow-control:

While functionally correct, this code will cause an exception to be thrown on every received order placement message that is not marked as a duplicate. In other words, the vast majority of cases will be needlessly generating an exception.

Luckily (or perhaps unluckily) for us, Oracle's Hotspot JVM was doing its best to compensate and was probably making the throw very fast by not filling in the stack-trace and re-using Exception instances, so that the effect wasn't obvious when profiling the application. Since the cost of this extra piece of work didn't show up in a profiler, surely it can't have had that much effect on latency?

The solution

A small change to the method allows us to avoid the cost of throwing the exception by checking for the field's existence before querying its value:

In our performance environment, we continuously measure the latency of our exchange with a production-like load, as part of our continuous delivery process. Below are some measurements of end-to-end latency in microseconds before the change above was made:

| fastest | mean | two_nines | four_nines |
|     244 |  616 |      3168 |       9008 |
|     255 |  627 |      3007 |       8753 |
|     245 |  517 |      2453 |       6645 |
|     245 |  606 |      2871 |       7901 |

The improvement from removing the exception-generating code is clear:

| fastest | mean | two_nines | four_nines |
|     218 |  417 |      1934 |       6940 |
|     217 |  450 |      1863 |       7059 |
|     216 |  449 |      2093 |       7095 |
|     228 |  430 |      1858 |       7007 |

From these results we can see that we've shaved about 25us off our fastest execution time, and about 120us off our mean execution time (15-20%).

Checking through code for this kind of behaviour revealed a couple of other places where we were using the same pattern. Though none of these instances were on a latency-sensitive path, they have still been replaced with the query-then-get pattern in order to reduce Exception construction costs.


Just because something doesn't show up in the profiler, doesn't mean that it's not having a significant effect on your system. In this case, Oracle's Hotspot JVM was doing so well at optimising that we didn't notice the overhead. We've been working on improving the speed of our systems for many years now (there's more in the pipeline), and your code needs to be fairly optimal before this sort of problem shows up. It just goes to show that it's worth taking a cruise through your code - there may be a one-line change lurking that will gain you a 20% improvement.