Many people may have seen a theory of thread number setting:
- CPU-intensive
-
programs – number -
I/O intensive program – number of cores * 2
of cores + 1
will not, No, does anyone really plan the number of threads according to this theory?
A small test of the number of threads and
CPU utilization
Aside from some operating systems, computer principles, let’s talk about a basic theory (don’t worry about whether it is rigorous, just for easy understanding): a CPU core, only one thread’s instructions can be executed per unit of time ** Then in theory, I only need to keep executing instructions to run the utilization rate of a core.
Let’s write an example of an endless loop empty run
:
test environment: AMD Ryzen 5 3600, 6 – Core, 12 – Threads
public class CPUUtilizationTest {
public static void main(String[] args) {
Endless loop, do nothing
while (true){
} }}
After running this example, let’s take a look at the current CPU utilization:
as you can see from the figure, My No. 3 core utilization has been run to
the full,
based on the above theory, how many more threads do I open to try?
public class CPUUtilizationTest {
public static void main(String[] args) { for (int j = 0; j < 6; j++) {
new Thread(new Runnable() {
@Override
public void run() {
while (true){ } } }).start(); } }} Look
at
the CPU utilization at this time, 1/2/5/7/9/11 The utilization of several cores has been full:
what if you open 12 threads? Will it run the utilization of all cores to the fullest? The answer must be yes:
What happens if I continue to increase the number of threads in the above example to 24 threads at this point?
As you can see from the figure above, the CPU utilization is the same as the previous step, or 100% of all cores, but at this time the load has increased from 11.x to 22.x (load average explanation reference scoutapm.com/blog/unders… ), indicating that the CPU is busier at this time, and the tasks of the thread cannot be executed in time.
Modern CPUs are basically multi-core, such as the AMD 3600 I tested here, 6-core 12 threads (hyper-threading), we can simply think of it as a 12-core CPU. Then my CPU can do 12 things at the same time without disturbing each other.
If the thread to be executed is larger than the number of cores, then it needs to be scheduled by the operating system. The operating system allocates CPU time slice resources to each thread, and then switches non-stop to achieve the effect of “parallel” execution.
But is this really faster? As you can see from the above example, one thread can run a core to full utilization. If each thread is very “overbearing”, constantly executing instructions, not giving the CPU idle time, and the number of threads executing at the same time is greater than the number of CPU cores, it will lead to more frequent execution of the operating system to switch thread execution to ensure that each thread can be executed.
However, switching comes at a cost, and each switch is accompanied by operations such as register data updates, memory page table updates, and so on. Although the cost of a switch is insignificant compared to I/O operations, if there are too many threads, thread switching is too frequent, or even the time of switching per unit time is greater than the time of program execution, it will cause excessive CPU resources to be wasted on context switching, not in the execution of programs, which is worth the loss.
The above example of an endless loop of empty running is a bit too extreme, and it is unlikely that there will be such a procedure under normal circumstances.
Most programs will have some I/O operations when they are running, which may be reading and writing files, sending and receiving messages over the network, etc., and these I/O operations need to wait for feedback when they are in progress. For example, when reading and writing on the network, you need to wait for the message to be sent or received, and in this waiting process, the thread is in the waiting state, and the CPU is not working. At this time, the operating system will schedule the CPU to execute the instructions of other threads, which makes perfect use of the idle period of the CPU and improves the utilization rate of the CPU.
In the above example, the program keeps looping to do nothing, and the CPU keeps executing instructions, with little free time. What if you insert an I/O operation, what happens to the CPU utilization if the CPU is idle during the I/O operation? Let’s take a look at the results under single threading:
public class CPUUtilizationTest {
public static void main(String[] args) throws InterruptedException { for (int n = 0 ; n < 1; n++) {
new Thread(new Runnable() {
@Override
public void run() {
while (true){
After 100 million empty loops per time, sleep 50ms, simulate I/O waiting, switching
for (int i = 0; i < 100_000_000l; i++) { } try {
Thread.sleep(50); } catch (InterruptedException e) {
e.printStackTrace(); } } } }).start(); } }}

Wow, the only No.
9 core with utilization is only 50% utilization, which is half lower than the previous 100% without sleep. Now adjust the number of threads to 12 and see:
The utilization rate of a single core is about 60, which is not much different from the single-threaded result just now, and the CPU utilization has not been full, and now the number of threads is increased to 18:
At this time, the single-core utilization rate is close to 100%. It can be seen that when there are I/O operations in the thread that do not occupy CPU resources, the operating system can schedule the CPU to execute more threads at the same time.
Now turn up the frequency of I/O events and reduce the number of cycles to half, 50_000_000, the same 18 threads:
> utilization per core at this time, It’s only about 70%.
A small summary of the number
of threads and CPU utilization
The above example is only auxiliary, in order to better understand the relationship between the number of threads / program behavior / CPU state, to briefly summarize:
-
If each thread is so “extreme”, and the number of threads executing at the same time exceeds the number of cores, it will lead to unnecessary switching, resulting in excessive load, which will only make execution slower -
When -
waiting/pause time, the longer the CPU idle time, the lower the utilization, and the operating system can schedule the CPU to execute more threads
to the number of cores at the same time “extreme” thread number
I/O and other pause operations, the CPU is idle, and the operating system schedules the CPU to execute other threads, which can improve CPU utilization, and the higher the frequency of executing more thread I/O events at the same time, or the longer the
The foreshadowing
of the formula for thread number planning is all to help understand, now let’s look at the definition in the book.
Java Concurrent Programming in Action introduces a formula for calculating the number of threads:
If you want the program to run to the target utilization of the CPU, the formula for the number of threads required is:
formula is clear, Now let’s bring in the above example
:
if I expect the target utilization to be 90% (90 multicores), then the number of threads needed is:
now adjust the number of threads to 22 , look at the result:
Now the CPU utilization is about 80+, which is relatively close to expectations, due to the excessive number of threads, and some context switching overhead, coupled with the lack of rigor in the test cases, it is normal for the actual utilization to be low.
Twist the formula into shape, you can also calculate CPU utilization by the number of threads:
Although the formula is good, in a real program, it is generally difficult to get accurate wait times and calculation times, because the program is complex and does not just “calculate”. There will be a lot of memory read/write, calculation, I/O and other composite operations in a piece of code, and it is difficult to accurately obtain these two indicators, so it is too ideal to rely on formulas to calculate the number of threads.
The number of threads in the real program, so in the actual program, or some Java business systems, how much is
the thread number (thread pool size) planning appropriate?
Let’s start with the conclusion: there is no fixed answer, first set expectations, such as how much CPU utilization I expect, what the load is, how much GC frequency and other indicators, and then through the test to constantly adjust to a reasonable number of threads such as a
normal, SpringBoot-based business system, the default Tomcat container + HikariCP connection pool + G1 collector, If at this point the project also needs a scenario multithreading (or thread pool) to execute business processes asynchronously/in parallel.
At this time, if I plan the number of threads according to the above formula, the error must be very large. Because at this time, there are already many running threads on this host, Tomcat has its own thread pool, HikariCP also has its own background threads, JVM also has some compiled threads, and even G1 has its own background threads. These threads are also running in the current process, on the current host, and also occupy CPU resources.
Therefore, due to environmental interference, it is difficult to accurately plan the number of threads by formulas alone, and must be verified by testing.
The general process is as follows:
-
analyze whether there are other processes on the current host interfering with
-
analysis of the current JVM process, and whether there are other running or potentially running threads
-
Set a
-
target CPU utilization – how much can I tolerate my CPU spike?
-
– After multi-threaded execution, the GC frequency will increase, what is the maximum frequency tolerated, and how much time is each pause?
-
Execution efficiency – for example, when batch processing, how many threads I have to open per unit time to process in time
-
….
-
some nodes on the link may cause a large number of threads to wait for resources (such as three-party interface current limiting, limited number of connection pools, excessive middleware pressure to support
-
constantly increase/decrease the number of threads to test, test according to the highest requirements, and finally get one Number of threads that “meet requirements”**
the
target
Target GC frequency/pause time
Comb the key points of the link, whether there is a point of stuck neck, because if the number of threads is too large, the limited resources of
, etc.) to
And more! The concept of thread count in different scenarios is also different:
-
maxThreads in Tomcat are different under Blocking I/O and No-Blocking I/O -
Dubbo is still single-connection by default, there is also a distinction between -
after Redis 6.0, but it is only I/O multi-threaded, “business” processing or single-threaded
I/O threads (pool) and business threads (pool), I/O threads are generally not bottlenecks, so there are not too many, but business threads are easy to call bottlenecks Redis 6.0 is also multi-threaded
So, don’t worry about how many threads to set up. There is no standard answer, you must combine the scene, with the goal, through the test to find the most suitable number of threads.
There may be students who may have questions: “Our system is not under pressure, do not need such a suitable number of threads, just a simple asynchronous scenario, does not affect other functions of the system can be”
is
normal, a lot of internal business systems, do not need any performance, stable and easy to use to meet the needs of it, then my recommended number of threads is: CPU core number
appendix
Java gets the number of CPU cores Runtime.getRuntime()
.availableProcessors()// gets the number of logical cores, such as 6 cores and 12 threads, then it returns 12
Linux gets the number of CPU cores# Total number of cores = number of physical CPUs X number of cores per physical CPU# Total logical CPUs = number of physical CPUs X number of cores per physical CPU X number
of hyperthreads#
View the number of physical CPUs cat /proc/cpuinfo| grep "physical id"|
sort| uniq| wc -l# View the number of cores (i.e. cores) in each physical CPU
cat /
proc/cpuinfo| grep "CPU cores"| uniq# to see the number of
logical CPUs cat /proc/cpuinfo| grep "processor"| wc -l