Introduction to Java Thread Profiling
In this tutorial, we’ll have a quick look at java thread profiling.
A significant enhancement in Java 5, which has received surprisingly little attention in the Java programming literature, was the introduction of various profiling facilities directly in Java. Since Java 5 it has been possible to perform various profiling tasks directly from Java, without having to resort to native code. An upshot of this is that it is much easier and safer to embed profiling facilities in a live environment. This article will be geared slightly towards profiling in a Servlet environment, although much of what we discuss will be applicable to server (and indeed client) profiling generally.
To start with it’s important to note that the profiling techniques we’ll be discussing here don’t involve a simple plug-in-one-line-of-code-and-go. Java 5 provides you with the core routines to get profiling data out of the JVM, but you’ll generally need to wrap some code around it for this information to be useful. But in a sense that’s the whole point: you can tailor your profiling code to do exactly what you need for your particular application. And it turns out that the code required isn’t generally too long or complicated.
Introduction to thread profiling
We’ll be looking specifically at thread profiling: that is, taking a measure of what proportion of time (and/or CPU) threads are spending in different parts of your code1. As we develop our profiling code, some questions we’ll need to consider include:
- what tradeoff do we want to make between accuracy of the profile and impact that profiling has on the server or application?
- Do we speficially care about measuring which portions of code are specifically using CPU, or do we simply want a measure of where threads are spending time (even if this is actually doing non-CPU intensive work such as performing IO)?
- What granularity or grouping of measurements will be most useful to us? Do we want to go down to the level of the individual line, or do we only care which methods occupy more thread time? When method A calls method B, how do we decide whether to count this as time spent inside method A, B or both?
-
Are there cases that we want to exclude from our profiling data? (e.g. if we know that such-and-such a class or thread is uninteresting, perhaps we don’t want to count it.)
Basic thread profiling methodology
The underlying thread profiling technique we’ll be looking at here involves periodically “polling” the JVM to ask it for the current point of execution of all currently alive threads. This is a kind of “passive” profiling technique in which we don’t need to modify the code that we are profiling. (A technique that we won’t be considering in this article is to “actively” inject code at key points for the purpose of profiling1.) The basic thread profiling procedure is thus as follows:
- make a call asking the JVM to give us the current stack traces of all threads;
- If we desire, ask the JVM which of these threads have actually consumed CPU (and how much);
- From these stack traces, increment our counts/profiling data;
- Wait for some (generally short) period of time, then repeat the procedure.
It’s clear from this description that profiling isn’t “free”. Asking the JVM to take the stack trace snapshots, in addition to any processing we then do, has an impact.
It’s also clear that this is essentially a sampling approach. We won’t be able to tell with complete accuracy how long a thread spends in a particular place. For example, say we repeat the above process 20 times a second. When we are told that a thread is in a particular place and that place was different from the last time we sampled, that means that it has been there for some period between 0 and 1/20 seconds. (And even if it’s the same place, we don’t know that the thread actually stayed there for the whole time between the two instances when we sampled.) So we have to make some kind of best guess. Generally, the least effort option (and one that is good enough) is to simply assume that the method was running for the full 1/20 second (or whatever the sample period). The more frequently we sample, the more accurate our profile will be, but the more impact our profiling routine will have on the server or application that we’re trying to profile.
Many systems can also provide us with information on how much CPU time (in user and kernel mode) a given thread has consumed. But even if we request this information, we are usually left with the same problem: if a thread has used X nanoseconds of CPU time since the last sample, we still don’t know if that consumption all went on the line or method now being executed.
What this generally boils down to is:
- We need to make sure that, over the time period of the function or application we’re profiling, we take a “reasonable number” of samples;
- The shorter the task we’re profiling, the more frequent we need to sample (or the less accurate our results).
For profiling a long-running server that executes similar functions over and over, this is actually quite good news. We can generally sample fairly infrequently (say, between 1 and 5 times a second), and simply leaving things going for a few hours to get enough samples. This will give us a good picture of generally where the server is spending its time.