When dealing with memory leaks, I thought of a problem of which objects to see from memory, so I parsed the actual running program memory, which helped to understand the memory layout and management of go through a visual way.

Memory layout includes memory alignment, the amount of memory occupied by a struct, and so on. In addition, for the go language, the heap object in its memory itself does not contain any identification information about the object, such as the type, etc. The layout of properties in the go language is related to the code order and is not automatically adjusted.

For a string type, it is actually a structure with two properties:

For arrays, it has three properties:

That is, if we see a string in a struct, how many bytes does that string take? For a 64-bit system, this is 16 bytes, 8 for addresses, and 8 for length.

First we define two structures:

One of the structs contains the other, let’s look at the layout format of these two structs (in a 64-bit system).

There are only two properties in the Class structure, one is a string and the other is uint, for the latter is uint64 in a 64-bit system, then its structure includes 24 bytes:

User structure is more complex, for the referenced Class, this is just a pointer, the pointer is 8 bytes, then its overall structure should occupy 32 bytes (especially pay attention to uint8, the value only occupies one byte), the structure is as follows:

We wrote a very small piece of code to test the memory layout of these two objects:

I deliberately used two different ways for the strings, one is a separate string, or string constant: “zhangsan”, and the other is a splicing of strings.

There is a difference between the two, for a constant string, it is usually placed in the program segment during compilation, while the stitched string is generated during the runtime, then we can see that it is explicitly generated on the heap.

First of all, the program runs, its PID is 14173, how do we look at the memory allocation? The easiest way to do this is to view it directly at /proc/14173/maps:

For the first three lines, it should theoretically be a program segment, some information in the middle is heap data, and finally it should be a system call.

You can use the gdb command to dump the data in memory directly into a binary file, and for this purpose you write a script that can be executed directly:

Its input parameter is the process ID, and when called, the information in memory at that time will be dumped into a file:

The dump file that starts with the process ID is the memory data we dumped down.

We define a special string for filtering, and that is class-, because the values behind are random, we don’t know what it is, but it is enough to pass this string.

At this point, use the strings command to find, the command will be able to convert to string to string view, our purpose is to find in which file:

Eventually we found this dump file: 14173-c000000000-c000400000.dump

Let’s take a closer look at what’s inside this dump file, because the content is binary, so we take a hexadecimal way to look at it, using the command hexdump, here’s the content:

Our aim is to find the class-, so we only need to look at a part of it before and after:

It can be seen that the address where the class-1 string is located is: 00ae010 + c0000000000=c0000ae010, according to the theory, the following is to find out which reference this address.

Before finding the reference address c0000ae010, we first need to do a calculation, because this is a hexadecimal, we know that in fact, when looking at it through hexdump, most of it is octal, the calculation is very simple, the address can be separated by two or two: c0 00 0a e0 10, after calculation can get the following results:

300 0 12 340 20, we need to return the result, easy to find, then the result that needs to be found may be: 20 340 012 \0 300. (Not really). Why not? A note is required:

The standard ascii code represents from 0 to 127, this value is from: 0 ~ 177 if represented in octal; for this range of values will be displayed in the form of ascii code, for 12, the display of the ascii code is a newline character, that is, \n; For 20, its ascii code display is: data link escape, this has no character correspondence that can be displayed, usually using the original 020 to describe.

Then the result of our query should be: 020 340 \n \0 300

The reference was actually found through our search:

We’ll expand the lookup slightly by a few lines (up and down):

Let’s analyze this memory information according to the code, first take the memory layout of the previous class:

The following is the result of the analysis based on the memory layout:

009c018 is the offset address of the address (note that the last bit is 8, because it does not start exactly at 009c010), and the following we are looking for is the User object.

Or find the address c00009c018 (009c018 + c000000000) in the same way.

Convert first, the process is no longer written, and the result after conversion is: 030 300 \t \0 300

Then do a lookup and find 4:

We analyzed these four next to each other, and they are all in the file 14173-c0000000000-c000400000.dump.hex.

Combined with User’s memory layout (as follows), there should be 24 bytes in front of the class, indicating the beginning:

First analyze the first one:

As you can see, the address is not User’s address information, because the location of the mapping of Age and Color is not legitimate.

Through the analysis, we know that the location of Age and Sex should be 022 and 001, so we can filter directly, leaving two addresses:

As noted above, both addresses are legitimate. A brief explanation:

In the program, we set Age=18, where 022 (octal) = 2 * 8 + 2 = 18, is as expected;

The Sex=1 set in the program corresponds to 001 in memory and is also successful;

In the program, we set the Name to “zhangsan”, that is, 8 bytes, \b in ascii code to represent the backspace key, exactly 8 in decimal, 10 in octal, which is also in line with expectations.

As you can see, the memory with an absolute address of 49995e is in 14173-00482000-00510000.dump, and the offset is calculated:

Offset = 49995e-482000=1795e, but it should be considered that this offset is not a value that can be divisible by hexadecimal, that is, it does not appear in the first column of the file, and its corresponding opening address should be 17950.

Below we can all 17950 and see the following information:

It can be clearly seen that Zhangsan’s address is indeed a 1795e address. Where is this string?

Let’s go back to the directory: /proc/14173/maps, the first three lines of it:

The content of zhangsan is located on line 2, note the permission ID of the second line: r–p, the permission identifies that it is a read-only, can not be executed, what data is read-only, unexecutable, generally speaking, is put into the constant pool. Also, what needs to be seen is that the first three lines are information that describes the current process. A simple description is as follows:

 About the author

Shao Zhuguang

Tencent back-office development engineer

Tencent background development engineer, currently responsible for the design and research and development of the blockchain open source underlying platform – Changan Chain, likes to summarize and think in his work, and has a unique love for technology.

 Recommended reading