The evolution of tools has always been a sign of the progress of human productivity, rational use of tools can greatly improve our work efficiency, when encountering problems, reasonable use of tools can speed up the progress of problem troubleshooting. That’s why I really like the shell, its rich command-line toolset and pipeline features are really accurate and elegant in handling text datasets.

But many times the expressiveness of the text is very limited, it can be said that it is lacking, and when expressing absolute values, it is naturally invincible, but when it comes to displaying relative values, it is somewhat stretched, let alone multidimensional data.

We can use the shell to query the accumulated value, maximum value, etc. in the text very quickly, but when it comes to the correlation analysis of two sets of values, we are helpless. At this point, another analytical tool – graphs – such as scatter plots can clearly show the correlation.

Today I am going to introduce a diagram, the flame diagram, the god in the group has shared its use before, but I haven’t used it for a long time, so I am not impressed by it, and recently tried it when troubleshooting our Java application load problem, and then I have a little experience in its use.

When troubleshooting performance issues, we usually dump the thread stack and then use

grep --no-group-separator -A 1 java.lang.Thread.State jstack.log | awk 'NR%2==0'  | sort | uniq -c | sort -nr 

similar shell statement to see what most of the thread stack is up to. The frequency of the thread stack is used to infer the most time-consuming calls in the JVM.

As for its principle, imagine that there is a large screen in the square that constantly plays various advertisements. If we randomly take photos of the large screen, more times, count the frequency of each advertisement in the photo, and basically get the proportion of the playback time of each advertisement.

And our application resources are like a big screen, each call is like playing an advertisement, statistics dump out of the thread stack appearance ratio, can basically see the thread stack time-consuming proportion, although there is an error, but multiple statistics should not be much different. This is why every time some parents enter their child’s room, they find that their children think that their children usually like to daze at the desktop after looking at the system desktop. 🙂

2444  at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(  at sun.misc.Unsafe.park(Native Method)795  at  at java.lang.Object.wait(Native Method)292  at java.lang.Thread.sleep(Native Method) 73  at org.apache.logging.log4j.core.layout.TextEncoderHelper.copyDataToDestination( 71  at Method) 70  at java.lang.Class.forName0(Native Method) 54  at org.apache.logging.log4j.core.appender.rolling.RollingFileManager.checkRollover(

But this has some problems, first writing the shell is quite laborious, and if I want to see the most calls from the top of the second stack, even if the shell command is modified, the result is not intuitive.

The main reason for this problem is that our thread stack has a call relationship, that is, we need to consider the two dimensions of the thread stack, call chain and frequency of occurrence, and a single text representation of these two dimensions is more difficult, so the famous performance analysis master Brendan Gregg proposed the flame diagram.


the flame diagram, named for its flame-like shape, its open source code address:

It is an SVG interactive graphic that we can display more information with a click and mouse point. The figure below is a typical flame diagram, structurally, it is composed of multiple blocks of different sizes and colors, each square has characters on it, and they are connected at the bottom of a piece, forming the base of the flame, and many “small flames” emerge from the top.

When we click on the square, the

image expands upwards from the base of the square we clicked, and when we point the mouse at the square, a detailed description of the square is displayed.

Before introducing the analysis of the flame diagram, we must first explain its characteristics:

  • a unique call chain can be traced from the bottom to the top, and the lower square is the parent call of the upper square.
  • Squares called by the same parent are arranged alphabetically from left to right.
  • The characters on the square represent a call name, and within parentheses are the number of times the call pointed to by the flame graph appears in the flame graph and the percentage of the width of the bottom square that this square occupies.
  • The color of

  • the blocks has no practical meaning, and the color difference of adjacent squares is only for ease of viewing.


give us a flame map, how can we see where there is something wrong with the system?

From the characteristics of the flame graph above,

when viewing the flame graph, our main focus should be on the width of the square, because the width represents the number of times the call stack appears globally, and the number of times represents the frequency, and the frequency can also indicate the time.

But it is of little significance to observe the width of the bottom or middle square of the flame map,

such as the above flame map, the width of the do_redirections function in the middle is 24.87%, which means that it consumes nearly a quarter of the time of the entire application, but the real time consumption is not the do_redirections function, but other functions called inside the do_redirections, and its sub-calls are divided into many, There is no exception in the time each call takes.

We should pay more attention to some “flat top mountains” at the top of the flame graph, the top indicates that it has no child calls, and the square width indicates that it takes a long time, hangs for a long time, or is called very frequently, and the call pointed by this square is the culprit of the performance problem.

Find the abnormal call, optimize it directly, or find our business code to optimize according to the call chain of the flame diagram, and you’re done.


tool has its own suitable application scenarios

, and the flame diagram is suitable for:

    code loop

  • analysis: if there is a large loop or dead loop code in the code, then there will be a clear “flat top” from the top of the flame graph or near the item, indicating that the code frequently switches up and down a certain thread stack. However, it should be noted that if the total time of the loop is not long, it will not be obvious on the flame diagram.
  • IO bottleneck/lock analysis: In our application code, our calls are generally synchronous, that is, when making a network call, file I/O operation, or unsuccessfully acquiring a lock, the thread will stay on a call waiting for an I/O response or lock, and if this wait is very time-consuming, it will cause the thread to hang on a call, which will be very clear on the flame graph. In contrast, the flame graph composed by our application thread cannot accurately express the CPU consumption, because there is no system call stack in the application thread, and when the application thread stack hangs, the CPU may do other things, resulting in us seeing that it takes a long time and the CPU is very idle.
  • Flame graph inversion

  • analysis global code: Flame graph inversion can sometimes be useful, if our code N different branches call a certain method, after inversion, all the same calls at the top of the stack are merged together, we can see the total time of this method, and it is easy to evaluate the benefits of optimizing this method.


the flame map is so powerful, how do we achieve it?


generation tool

brendan gregg has implemented the method of generating flame graphs in perl, and the open source code is in the Github repository above, and the file in the root directory is an executable perl file.

This command can also pass in various parameters, allowing us to modify the color, size, etc. of the flame map.

But can only handle files of a specific format, like:

a; b; c 12a; d 3b; c 3z; d 5a; c; e 3

is preceded by a chain of calls, and each call is replaced by ; Spaced, the number after each line is the number of times the call stack appears.

As shown in the above data, the flame graph generated with is as follows:

class=”rich_pages wxw-img” src=”” >

data preparation

As for how our jstack information is processed into the above format, God provides tools for common dump formats, such as can handle the output of perf commands, processing jstack output, the stack that handles gdb output, etc.

You can also use the shell to simply implement the handling of jstack:

grep -v –

P '.+prio=d+ os_prio=d+' | grep -v -E 'locked <' | awk '{if ($0==""){print $0 }else{printf"%s; ",$0}}' | sort | uniq -c | awk '{a=$1; $1=""; print $0,a}'

The summary

flame diagram is summarized, and there is another way to deal with performance problems in the future.

The longer I develop, the more important I feel the importance of tools, so I’m going to add a special feature to the various tools I use. Of course, this also requires me to learn more, use and summarize the new tools.