A program that recently ran online had a performance issue, but by analyzing the code review, the root cause of the problem could not be found. Therefore, the powerful profiling tool perf can only be used to find out what the problem is.

The perf tool is very powerful, but this article does not introduce the use of the perf tool, but describes the implementation principle of perf. There are many articles that describe the use of perf, but there are very few articles that describe the principles and implementation of perf.

But because perf is so powerful, its implementation is also very complex. This article describes only one of these features: the frequency of function calls in the profiling process.

Next, let’s first describe how to use perf to analyze the frequency of function calls in a process.

Before introducing the implementation of perf, let’s analyze a simple program using perf, which has the following code:

The above program is simple, we create two functions: workload1 and workload2. As you can see from the code, workload2 has a 2 times the payload of workload1.

Now let’s use perf to analyze where the performance bottleneck of this program is.

After running the above command, a perf.data file is generated that records the sample data when the sample program runs.

The result is shown in the following figure:

As you can see from the graph above, the load of the function workload2 (65%) is about twice that of the function workload1 (35%), which is basically the same as our code.

With the above example, we probably know how to use perf to analyze the performance bottleneck of the program. Next, we will introduce the internal implementation of perf.

Let’s think about it, if we were to design a scheme where the functions in the statistical program consume CPU time, how should we design it? The simplest solution is to record the current time at the beginning of each function, and then after the function execution ends, use the current time to subtract the time when the function began to execute, and get the total execution time of the function. The following pseudocode is as follows:

Although the above method can count the time consumption of individual functions in the program, there are many problems:

So we need a system that avoids the above problems:

Perf is born to solve the above problems, let’s first introduce the principle of perf.

To reduce the impact on program performance, perf does not add statistical code to every function, but instead sampling.

The principle of sampling is: set a timer, when the timer triggers, look at the function that the current process is executing, and then record it. This is shown in the following figure:

As shown in the figure above, each cpu-clock is the trigger point for a timer. Of the 6 timer trigger points, function func1 was hit 3 times, function func2 was hit 1 time, and function func3 was hit 2 times. So, we can infer that the function func1 has the highest CPU usage.

If the program has thousands of functions, then the sampled data may be very large, and it is necessary to sort the sampled data.

To sort the sampled data, perf uses the red-black tree data structure, as shown in the following figure:

As shown in the image above, there are 7 functions in the data sampled by perf, and perf uses the sampled data to build a red and black tree.

According to the characteristics of the red and black trees, the rightmost node is the function that has been hit the most, so that the function with the highest CPU utilization in the program can be found.

Because perf is so powerful, this article also describes only one of the features of perf: the CPU usage of statistical functions.

In the next article, we’ll cover the code implementation of perf. Linus, the founder of Linux, once said: Read the f**king source code, to really understand a system, you can only read its source code.