Source | bryantchang.github.io/2019/12/08/java-profile-tools/

.” The recent working day has not been too peaceful, and all kinds of large and small cases and solving cases have found that it has been a long time since I calmed down to concentrate on writing something. However, I still insisted on using my spare time to learn a lot of things in micro-classes, and found that the things summarized by the big guys were still different, compared to the boring textbooks in college, the content summarized by the big guys was more lively and easier to understand. I will also digest and absorb the things of the big guys later, and turn them into my own things to express them in words.

What I want to summarize today are the recent work on the test tool JMH and the Java runtime monitoring tool Arthas. They have also helped me a lot in my actual work. So here’s a look at how to use these tools. Also deepen your familiarity with these tools. For both tools, I will first briefly introduce the general usage scenarios of these tools, and then use a performance troubleshooting example to explain in detail the actual use of these two tools. Without further ado, get straight to the point.

Problem Description

In order to allow my later examples to run through the use of these two tools, I will first briefly describe the actual performance problems we encountered in development. Then lead to the actual use of these two performance tools, and see how we can use these two tools to successfully locate the performance bottleneck.

The problem is as follows: in order to support the loss rate, we have fixed the original Async+ custom Appender method of log4j2, and put the asynchronous logic into our own revised Appender. However, we found that the log performance after the modification is much lower than the previous way Async+ customized Appender. I’m not using an example from an actual company for privacy reasons, I’m using a different way to illustrate the problem as well. Let’s not give a specific configuration file for now, but first give the performance test code and the result code

package com.bryantchang.appendertest; 

import org.slf4j.Logger;


import org.slf4j.LoggerFactory;

public class AppenderTest {

    private static final String LOGGER_NAME_ DEFAULT = "defaultLogger";


    private static final String LOGGER_NAME_INCLUDE = "includeLocationLogger";
    private static final Logger LOGGER = LoggerFactory.getLogger(LOGGER_NAME_INCLUDE);
    public static final long BATCH = 10000;

    public static void main(String[] args) throws InterruptedException {


        while(true) {
            long start, end;            start = System.currentTimeMillis();

            for (int i = 0; i < BATCH; i++) {


                LOGGER.info("msg is {}", i);            }            end = System.currentTimeMillis();

            System.out.println("duration of " + LOGGER_NAME_INCLUDE +  " is " + (end - start) + "ms");


            Thread.sleep(1000);        The code

logic is very simple, that is, the call logger.info prints 10,000 logs each time, and then records the time. The comparison between the two is as follows

From these two pictures we can see the same logic, the time difference between the two programs is dozens of times, but looking at the pictures, it seems that the name of the logger is different. Analyzing the above experimental results, we may have two questions

  • whether the above code test is standard, and
  • if the specification is

  • really a performance problem, then there is such a big gap between the two codes in which method leads to the final performance difference

The following two tools answer these two questions

JMH Introduction

The first question is whether the test method is standard. When we write the code, we use an endless loop + “pinching the stopwatch” before and after. If we want to add a multi-threaded test, we also need to build a thread pool, in addition to the logic of the code itself, we must also care about the logic of the test. We wonder if there is a tool that can completely free us from the test logic and only care about the code logic we need to test. JMH is one such Java testing framework. The following is the official definition of

JMHJMH is a performance benchmarking framework
for the Java language or other Java virtual machine languages

What we need to note here is that the method tested by JMH is about simple and better, the less dependencies the better, the most suitable scenario is to test two sets put, get performance, such as the comparison of ArrayList and LinkedList, etc., here we need to test the time required to batch a batch of logs, which is also basically in line with the test scenario of using JMH. Below is the test code, bench framework code, and the main function.

public class LogBenchMarkWorker {

    private LogBenchMarkWorker () {}

    private static class LogBenchMarkWorkerInstance {


         private static final LogBenchMarkWorker instance = new LogBenchMarkWorker();    }

    public static LogBenchMarkWorker getInstance() {


        return LogBenchMarkWorkerInstance.instance;    }

    private static final String LOGGER_DEFAULT_NAME = "defaultLogger";


    private static final String LOGGER_INCLUDE_LOCATION = "includeLocationLogger";
    private static final Logger LOGGER = LoggerFactory.getLogger(LOGGER_DEFAULT_NAME);
    private static long BATCH_SIZE = 10000;

    public void logBatch() {


        for (int i = 0; i < BATCH_SIZE; i++) {
            LOGGER.info("msg is {}", i);        

} }}}

can see that the method to be tested is very simple, that is, the operation of printing 10,000 logs at a time in a single batch, so there is no part that needs additional explanation. Let’s take a look at the benchmark part.

public class LogBenchMarkMain {

    @Benchmark


    @BenchmarkMode (Mode.AverageTime)
    @Fork(value = 1)
    @Threads(1)
    public void  testLog1() {        LogBenchMarkWorker.getInstance().logBatch();    }

    @Benchmark


    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1)
    @Threads(4)
    public void testLog4() {        LogBenchMarkWorker.getInstance().logBatch();    }

    @Benchmark


    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1)
    @Threads(8)
    public void testLog8() {        LogBenchMarkWorker.getInstance().logBatch();    }

    @Benchmark


    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1)
    @Threads(16)
    public void testLog16() {        LogBenchMarkWorker.getInstance().logBatch();    

In this code, we will find that there are some things that are unique to JMH, which I will briefly introduce below.

Benchmark Note: Identified on a specific method, indicating that this method will be the smallest method to be tested, in JMH becomes OPSBenmarkMode: test type, JMH provides several different modes of throughput: Overall throughput AverageTime: Average time of call, time per OPS execution    SampleTime: sampling, giving different proportions of ops time, such as 99% ops time, 99.99% ops time fork: number of forks, if set to 2, JMH will fork out two processes to test Threads: it is easy to understand, this parameter indicates how many threads this method is executed at the same time

In the above code, I defined 4 methods to be tested, the Fork of the method, BenchmarkMode are the average time to test a single OPS, but the number of threads of the 4 methods is different. In addition to these parameters, there are also a few parameters, I will talk about in the main function, the main code is as

followspublic class Main {
public static void main(String[] args) throws RunnerException {
        Options options = new OptionsBuilder()
                .include(LogBenchMarkMain. class. getSimpleName())
                . warmupIterations(5)
                . measurementIterations(5)
                . output("logs/BenchmarkCommon. log")
                . build()
;
        new Runner(options).run();    }}

We can see that in the main function, we are going to set the basic configuration of JMH, and several of the configuration parameters in it are as follows:

 include: The name of the class where the benchmark is located, you can use the regular expression warmupIteration: the number of iterations to warm up, the reason why the warm-up here is due to the existence of JIT, as the code runs, the code will be dynamically optimized. Therefore, in the testing process, you need to warm up a few rounds first, let the code run stably, and then actually test measurementIterations: actual test round output: test report output locationI

use two loggers to run the test and view the performance test report compared

to using ordinary

loggers LogBenchMarkMain.testLog1   avgt    5  0.006 ± 0.001   s/opLogBenchMarkMain.testLog16  avgt    5  0.062 ± 0.026   s/opLogBenchMarkMain.testLog4   avgt    5  0.006 ± 0.002   s/opLogBenchMarkMain.testLog8   avgt     5 0.040 ± 0.004 s/op

uses INCLUDE_LOCATION logger

 LogBenchMarkMain.testLog1   avgt    5  0.379 ± 0.007   s/opLogBenchMarkMain.testLog16  avgt    5  1.784 ± 0.670   s/opLogBenchMarkMain.testLog4   avgt    5  0.378 ± 0.003   s/opLogBenchMarkMain.testLog8   avgt     5 0.776 ± 0.070 s/op

Here we see that the performance gap is immediately apparent. The performance of using INCLUDE_LOCATION is significantly lower than that of using a normal logger. We must be curious and ask “where the hell is slow”!!

Arthas What exactly does my code do at runtimeArthas

is a Java debugging artifact of Alibaba, similar to greys, but with more powerful functions than greys, for example, you can directly draw flame diagrams of java method calls. Both tools are based on the powerful bytecode technology of Java. After all, I am not a JVM boss, so the specific implementation details cannot be expanded, all we have to do is stand on the shoulders of giants, accept and skillfully use these easy-to-use performance monitoring tools.

hands-on

talk is cheap, show me your code, since it’s a tool, let’s do it directly.” We run our initial program locally and then explain how to use arthas.

We first find the process number of the program through jps, and then directly

through the as.sh given by arthas to enhance the bytecode of the program we run, and then start arthas, the command is as follows

./as.sh pid

As you can see, Arthas supports viewing the current JVM status, viewing the current thread status, monitoring the call time of certain methods, trace, profile to generate flame graphs, etc., all functions
We will only put a few more commonly used commands here, other commands if you are interested can see the official website arthas official website. This article mainly introduces the following functions

  • decompiled code
  • monitors the call of
  • a method

  • to see the call and return value of a method
  • trace a method

the main command to call the monitoring

method is monitor,

according to the official website, the commonly used method is

monitor -c duration className methodName

where duration represents the display of statistical results every few seconds, that is, a single statistical period, className is the fully qualified name of the class, methodname is the name of the method, where we look at the method is the info method of the Logger class, we use two different logger info methods respectively. The class here is org.slf4j.Logger, the method is info, our monitoring statement is

monitor -c 5 org.slf4j.logger infoThe

monitoring result is as follows

We can see that the print log method using Include AppEder is 3 times higher than the normal Appender, which makes us wonder how time-consuming each step of these two methods is. Let’s introduce the second command, the trace method.

trace command & jad command

The log4j2 configuration files for both programs are as follows


?>
<configuration status="warn" monitorInterval= "30" >
<appenders>

name="console" target="SYSTEM_OUT">

            <ThresholdFilter level="warn" onMatch="ACCEPT" onMismatch="DENY" />

pattern="[%d{HH:mm:ss. SSS}] [%-5p] %l - %m%n"/>
        Console>

        <Async name="AsyncDefault" blocking ="false" includeLocation="false">


            <AppenderRef ref="fileAppender" />
        Async>

        <Async name="AsyncIncludeLocation" blocking=" false" includeLocation="true">


            <AppenderRef ref="fileAppender"/>
        Async>

        <File name="fileAppender" fileName="log/test.log" append="false" >
            <PatternLayout pattern="[%d{HH:mm:ss. SSS}] [%-5p] %l - %m%n"/>
File>


        <File name="ERROR" fileName="logs/error.log">
            <ThresholdFilter  level="error" onMatch="ACCEPT" onMismatch="DENY"/>
            < PatternLayout pattern="[%d{yyyy. MM.dd 'at' HH:mm:ss z}] [%-5p] %l - %m%n"/>
        File>

        


name="rollingFile" fileName="logs/app.log"
                     filePattern="logs/$${date:yyyy-MM}/web-%d{yyyy-MM-dd}.log.gz">

            < PatternLayout pattern="[%d{yyyy-MM-dd HH:mm:ss}] [%-5p] %l - %m%n"/>
            <Policies>
                < TimeBasedTriggeringPolicy modulate="true" interval="1"/>
            Policies>

            <DefaultRolloverStrategy>
                <Delete basePath="logs" maxDepth="2">
                    <IfFileName glob="*/*.log.gz" />
                    <IfLastModified age="7d"  />
                Delete>
            DefaultRolloverStrategy>
        RollingFile>
     Appenders >



>