Logs are structured or unstructured text messages generated by the system during operation. Typically, it can be thought of as an application’s record of an event. Logging can often help us spot the unexpected or bursty behavior of some microservices architecture systems. Logging, as an important part of Observability, in IT and cloud computing, is the ability to measure the current state of a system based on system-generated data, such as logs, metrics, and datalink tracking, plays an irreplaceable role in the development and maintenance of our systems.

Pillars of Observability

To understand why logs play an important role in a product or system, we must understand its value. Until now, at least until now, the most widely used logs are for alarming, troubleshooting, and business data visualization.

Logs can be used as an important source of data for monitoring our business systems; Mature product systems have an alarm system, if there is a problem in the system that exceeds a defined indicator, the log system will automatically send the alarm information to the notification platform, and the On-call person can solve the problem according to the alarm information location.

This is very common; Imagine that the system you are responsible for developing and maintaining recently has been found to be problematic, and what is the first thing you do after you have sorted out your thoughts? It is definitely to check the system information to verify whether your assumptions are true, and the logs printed on the server here are the best auxiliary information. As programmers, logs are the tools we are most familiar with to solve problems. 

Many companies can use the logs stored in their own databases for production environments, combined with the corresponding tools to visualize business data. The most typical representatives here are Grafana and SumoLogic.

In order to better support the above various situations, we need to sort out our log format and write the log according to certain specifications, rather than just writing a nonsense.

Log Format

Base version

For logs, time, log level, and log information are most important, so a qualified log should contain at least this information.

Advanced version

On the basis of the basic version, add the thread name, host name, method name, class name, and the number of rows corresponding to the method;

Thread name: Most of the users of the application are not single, for a single-instance service to the same interface many users to access the application will be executed in different threads, then if you want to distinguish the corresponding user business process, then through the thread name is the best.

Hostname: Most of the current applications are deployed in the Cloud and are multi-instance, so on the basis of a single node, logs need instance-level differentiation on multiple instances, and hostname is the best way to distinguish it.

Method name: The method name of the log is printed to distinguish the source of the same log.

Class Name: The class name of the log is printed to quickly locate the business process.

Number of lines: The number of lines of the log is printed to quickly locate the specific location of the log.

To improve the readability of the log, we can decorate the log.

Bracket log level, host name, and thread name before and after it;

Bracket the class name and line number where the method name is located, and separate the class name from the line number with a colon;

Add a horizontal line between the line number and the log information to separate it;

Log information can also be formatted specifically

For regular requests, responses, or other business logs, you can separate custom information and parameters with underscores; Multiple parameters are separated by commas, of course, parameters are also optional;

For error message formatting, you can also organize it in the form of Key:Value.

The log is recorded, and if it is just a simple text description of the line, it does not make much sense. In complex systems or systems with frequent business operations, there are so many logs that we have to spend time filtering out the relevant logs. The best way to solve the above problem is to chain trace the log; To put it simply, one or more unique IDs in the business system are added to each log so that when locating business problems, you can use these unique IDs and other conditions (e.g. time) to quickly filter out the relevant logs.

Log Level

The output of the log is all graded, and different scenarios need to print different levels of logs; Here are a few of the more important log levels.

Debug: Records technical details, and some logs to help understand the operation of the system;

Info: A log that records business information;

Warn: Non-urgent and controllable acceptable error messages;

Error: An undesirable error or system performance, usually caused by a system bug or environmental problem.

At the same time, not all logs need to be recorded, we need to record on demand. The following table is recommended for selecting different log levels in different environments.

With the log level, the location of the log print also needs to be clear. In general:

When other systems call their own systems, they need to print a log once when they receive the request and when they complete the request;

Print the log once before the self-owned system calls the interface of the third-party system and after receiving the return information;

Log needs to be printed anywhere in the system where there is an exception;

There is also a special case, such as messaging and other systems, in order to save log storage and reduce viewing interference, most of the time we do not need to print the message directly after receiving the message, it is generally recommended that after receiving the message, if the system handles an exception, the original message will be printed in the exception.

Different programming languages have different logging tools; More well-known is Apech’s Log4j, which is highly configurable and can be configured through external files at runtime. It is based on the priority level of logging and provides a mechanism to indicate logging information to many destinations, such as: database, file, console, UNIX syslog, etc.; And log4j has been ported to other programming languages, such as logging in Python, log4js in NodeJS, log4rs in Rust.

Avoid printing or recording any sensitive information, including but not limited to various PII, PCI information, and be sure to remember to comply with various local laws and regulations, such as China’s Personal Information Protection Law (PILI), Europe’s General Data Protection Regulation GDPR, etc

Choose the appropriate log level and log location as needed

……

Good logs can not only facilitate program development and provide the most important auxiliary information for troubleshooting, but also provide optimization suggestions or statistics for business or infrastructure.

THE TOP 25 GRAFANA DASHBOARD EXAMPLES

Grafana lab

SumoLogic

Log4j