How many threads should I use in my Java program
This is a quick tutorial on using number of threads in a Java Program.
Normally you think of using multiple threads to continue processing while waiting for slow or high latency IO operations.
Cache contention is a very important aspect of using multi CPUs to process a highly parallelized algorithm:
Make sure that you take your memory utilization into account. If you can construct your data objects so each thread has it’s own memory that it is working on, you can greatly reduce cache contention between the CPUs. For example, it may be easier to have a big array of ints and have different threads working on different parts of that array - but in Java, the bounds checks on that array are going to be trying to access the same address in memory, which can cause a given CPU to have to reload data from L2 or L3 cache.
Splitting the data into it’s own data structures, and configure those data structures so they are thread local (might even be more optimal to use ThreadLocal - that actually uses constructs in the OS that provide guarantees that the CPU can use to optimize cache.
The best piece of advice I can give you is test, test, test. Don’t make assumptions about how CPUs will perform - there is a huge amount of magic going on in CPUs these days, often with counterintuitive results. Note also that the JIT runtime optimization will add an additional layer of complexity here (maybe good, maybe not).
On the one hand, you’d like to think Threads == CPU/Cores makes perfect sense. Why have a thread if there’s nothing to run it?
The detail boils down to “what are the threads doing”. A thread that’s idle waiting for a network packet or a disk block is CPU time wasted.
If your threads are CPU heavy, then a 1:1 correlation makes some sense. If you have a single “read the DB” thread that feeds the other threads, and a single “Dump the data” thread and pulls data from the CPU threads and create output, those two could most likely easily share a CPU while the CPU heavy threads keep churning away.
The real answer, as with all sorts of things, is to measure it. Since the number is configurable (apparently), configure it! Run it with 1:1 threads to CPUs, 2:1, 1.5:1, whatever, and time the results. Fast one wins.