Locating a record in a B-TREE index is a pivotal capability for the execution of SQL statements.

InnoDB organizes data based on the index, and the update and delete operations need to go to the index to find the specific records first.

The insert operation also needs to first find out where the record is to be inserted into the index.

When the WHERE condition of the query statement can hit the index, it is also necessary to find the first record of the scan interval corresponding to the WHERE condition, and then read subsequent records from this record along the one-way linked list between the records in the index page and the bidirectional linked list between the index pages.

With the above brief introduction, the importance of locating records in the B-TREE index is obvious.

This article is the first article in MySQL 8 and the beginning of the query optimizer. I hope that through the introduction of this article, we can lay some foundation for everyone to understand the follow-up articles.

The next article, Estimating the Number of WHERE Conditional Scan Interval Records, builds on the content of this article.

This article is based on MySQL 8.0.29 source code.

directory

1. Overview

2. What is a scan zone?

3. Index page structure

4. Locate the first record of the scan area

4.1 Abstract process description

4.2 Prepare a B+ tree

4.3 Which leaf nodes are recorded?

4.4 Where is the leaf node recorded?

5. Performance optimization

5.1 Root node and inner node optimization

5.2 Leaf node optimization

6. Summary

body

The update, delete, query operation locates a record in the index, and the insert operation finds the position to be inserted, the process is basically the same, and the source code is also implemented in the same method.

This article introduces the query operation to locate the first record of the WHERE condition scan interval based on the premise that the WHERE condition can hit the index.

The dichotomous lookup and sequential lookup performed in the process of locating records involve part of the structure of the index page.

Next, two subsections describe the scan interval and the partial structure of the index page related to the positioning record process.

The scan interval is the WHERE condition, which consists of fields and relational operators (>, >=, <, <=, =), which are used to limit the range of records that need to be scanned.

This one-sentence description is too abstract, so let’s expand on it.

Scan intervals can be classified according to different dimensions:

Bounded intervals

Open intervals, for example: WHERE a > 100 AND a < 200, and scan intervals (100, 200).

Closed intervals, for example: WHERE a >= 100 AND a <= 200, and the scan interval is [100, 200].

Left open and right closing intervals, for example: WHERE a > 100 AND a <= 200, scan interval is (100, 200].

Left closed right open interval, for example: WHERE a > = 100 AND a < 200, scan interval is [100, 200).

One-sided bounded interval

There is a lower bound, left open interval, for example: WHERE a > 100, and the scan interval is (100, +∞).

There is a lower bound, left-closed interval, for example: WHERE a >= 100, scan interval is [100, +∞).

There is an upper bound, right open interval, for example: WHERE a < 200, and the scan interval is (-∞, 200).

There is an upper bound, right-closed interval, for example: WHERE a <= 200, and the scan interval is (-∞, 200].

Single point interval

An interval with only one value, for example: WHERE a = 100, and the scan interval is [100, 100].

The root node, inner node, and leaf node of the B-TREE index are all index pages.

The internal structure of the index page is more complex, there will be articles in the future to introduce the structure of the entire index page, and then we will only introduce the structure that needs to be used to locate records: pseudo records, record linked lists, slots (SLOT, which can also be called record grouping).

Record linked list

Each record in the index page has a 2-byte space in the header information that holds the offset of the next record in the current index page.

The offset, which is the address of the first byte of recorded data (without recording header information), minus the address of the first byte of the index page.

InnoDB index pages can be set to a maximum of 64K, and 2 bytes represents the offset of any one byte in the index page.

This 2-byte space, called next_record, can be used to string together records in the index page to form a unidirectional linked list through next_record.

Starting with any record and traversing it all the way back to the last record in the current index page.

Pseudo-records

A pseudo-record is a record in an index page that is not inserted by the user, but secretly inserted by InnoDB.

Regardless of whether there are records inserted by the user (user records) in the index page, there are 2 pseudo-records in each index page:

Slot (SLOT)

There are 3 types of slots in an index page:

Each slot occupies 2 bytes and holds the offset of the largest of the N records corresponding to that slot in the current index page.

The largest record is the last record in the slot sorted in ascending order by index field.

The slot in the index page is stored in a specialized area of the index page, which is called the Page Directory.

The slots in the page catalog area are sorted in reverse order and are immediately next to the storage, with the first slot at the end, the second slot at the penultimate slot, and so on, the last slot at the first.

The B+ tree index contains root, inner, and leaf nodes, locating the first record of the scan interval in a 3-story B+ tree, as follows:

As the hierarchy of the B+ tree increases or decreases, so do the above steps.

At each step of the above process, the internal process is the same, and it needs to be searched dichotomy first, and then ordered by search.

Finally, if it is a root node and an inner node, proceed to the next step; If it is a leaf node, there is no then.

The dichotomous and sequential lookup process is as follows:

Step 1, use the dichotomous lookup to determine which slot the record belongs to.

Each index page has a 2-byte area of header information that holds how many slots are in the current index page, and this area is called PAGE_N_DIR_SLOTS.

Read the value of PAGE_N_DIR_SLOTS, get the number of slots, and then subtract 1 to calculate the maximum sequence number of the slot: high = PAGE_N_DIR_SLOTS – 1, from which we get the upper boundary of the initial state of the dichotomy.

The lower boundary of the initial state, which is the ordinal number of the first slot (infimum slot), low = 0.

The dichotomous search may be 0 ~ N rounds (N > = 1), and the middle position will be calculated by mid = (low + high) / 2 for each round of the lookup.

Then, determine whether the record you are looking for is in the low interval (low ~ mid) or in the high interval (mid ~ high).

Finally, according to the judgment results, entering the low interval or high interval, the lookup is reduced by half, and the next round of lookups continues, and so on, until the values of low and high do not meet the conditions of high – low > 1, and the dichotomous search ends.

The dichotomy here not only supports a single point scanning interval, but also supports the scanning interval of these ranges greater than, greater than or equal to, less than, less than or equal to, can not find a record that satisfies the scanning interval and then stops immediately, but waits until the values of low and high do not meet the cycle conditions to end the process of dichotomous search.

At the end of a dichotomous lookup, the record to be found always belongs to the high slot (the slot corresponding to the upper boundary high), and the low slot is always the previous slot of the high slot. This is critical for the sequential lookup in step 2 to successfully locate the location recorded in the slot.

Step 2, after determining the slot in which the record is located, look up along the next_record order in each record header information to determine the location of the record in the slot.

Based on the state at the end of the dichotomous lookup, the sequential lookup continues.

Starting with the largest record in the low slot, the next record is read through next_record in the header information.

Compare the index field values in the next record with the field values of the scan interval to determine whether the next record is the first record in the scan interval.

If so, the sequential lookup process ends.

If not, continue reading the next record and determine if it is the first record in the scan interval, and so on, until the next record to be read is the largest record in the high slot, and the lookup process ends.

Next, let’s use an example to concretize the abstract process described above.

There is a primary key index containing an id field of type int, structured as a B+ tree, containing 2 layers: root node, leaf node, the index structure is shown in the following figure:

Let’s analyze the process of locating the first record of the scan interval [700, +>) corresponding to the id ∞= 700 query condition as an example to analyze the process of locating the first record of the scan interval in the B+ tree index.

The example index of the B+ tree, containing two layers of root nodes and leaf nodes, locates the first record of the scan interval, starting from the root node.

According to the steps described in the abstract procedure, the first slot of the [700, +∞) scan interval is determined by dichotomy.

The example index of the B+ tree, with 8 slots in the root node, initially, the upper and lower boundaries of the dichotomy are: low = 0, high = 8 – 1 = 7.

Dichotomy lookup

In the 1st round, calculate the middle position mid = (low + high) / 2 = (0 + 7) / 2 = 3, resulting in a low interval (low ~ mid = > 0 ~ 3) and high interval (mid ~ high = > 3 ~ 7).

The middle position corresponds to slot 3 (the slot with serial number 3), whose largest record id = 41, is less than the left endpoint value of the scan interval 700, indicating that the first record with id > = 700 (followed by the first record) is in the high interval.

Modify the lower boundary value, low = mid = 3, enter the high interval.

In the second round, calculate the middle position mid = (low + high) / 2 = (3 + 7) / 2 = 5, resulting in a low interval (3 ~ 5) and a high interval (5 ~ 7).

The middle position corresponds to slot 5, whose maximum record id = 81, is less than the left endpoint value of 700 in the scan interval, indicating that the first record is in the high interval.

Modify the lower boundary value, low = mid = 5, enter the high interval.

In the third round, calculate the middle position mid = (low + high) / 2 = (5 + 7) / 2 = 6, resulting in a low interval (5 ~ 6), high interval (6 ~ 7).

The middle position corresponds to slot 6, and its maximum record id = 901, is greater than the left endpoint value of 700 in the scan interval, indicating that the first record is in the low interval.

Modify the upper boundary value, high = mid = 6.

Then, high – low = 6 – 5 = 1, does not meet the loop condition high – low > 1, the dichotomous lookup ends.

The left endpoint value of the scan interval is 700, which is greater than the id value of the maximum record in slot 5 (81) and less than the id value of the maximum record in slot 6 (901), indicating that the first record belongs to the jurisdiction of slot 6 (in this case, slot 6 is the high slot).

Next, it’s time to go into the home field of the sequential lookup and find the location of the first record in the slot.

Sequential lookup

At the end of the dichotomous lookup, low = 5 (slot 5) with id = 81 for its largest record, and high = 6 (slot 6) with id = 901 for its largest record.

During the dichotomous lookup, the value of the left endpoint of the scan interval of 700 has been determined to be in slot 6, so during the sequential lookup, the record id = 81 does not need to be read (the last record in slot 5), but starts with the next record of that record, the first record of slot 6.

In the first round, read the next record with id = 81, get the record with id = 101, 101 is less than the left endpoint value of the scan interval 700, and you need to continue reading the next record for comparison.

In the second round, the next record with id = 101 is read, and the record with id = 888 is greater than the value of the left endpoint of the scan interval of 700, which locks the first record with id >= 700, between records with id 101 ~ 888, that is, before id = 888.

However, the record with id = 888 is the first user record on the leaf node index page where it is located.

The first record with id > = 700 cannot be on the same index page as the record id = 888, and can only be based on the previous index page of this index page.

In the root node, id = 101 is the previous record with id = 888, and the leaf node index page where id = 101 is located is the previous page of the leaf node index page where id = 888 is located.

Eventually, the first record with id >=700 is also in the leaf node index page where id = 101 is located.

At this point, after 2 rounds of comparison, the leaf node index page where the first record with id >= 700 is located has been determined, and the sequential lookup process is over.

Next, read the page number of its corresponding leaf node index page from the record id = 101 and enter the leaf node.

In the example index of the B+ tree, there are 10 slots in the leaf node, and in the initialization state, the upper and lower boundaries of the dichotomous lookup are: low = 0, high = 10 – 1 = 9.

Dichotomy lookup

In the 1st round, calculate the middle position mid = (low + high) / 2 = (0 + 9) / 2 = 4, resulting in a low interval (low ~ mid = > 0 ~ 4) and high interval (mid ~ high = > 4 ~ 9).

The middle position corresponds to slot 4, whose largest record id = 404, is less than the left endpoint value of the scan interval of 700, indicating that the first record with id >= 700 (referred to as the first record) is in the high interval.

Modify the lower boundary value, low = mid = 4, into the high interval.

In the second round, calculate the middle position mid = (low + high) / 2 = (4 + 9) / 2 = 6 to get the low interval (4 ~ 6) and the high interval (6 ~ 9).

The middle position corresponds to slot 6, and its maximum record id = 606, which is less than the left endpoint value of 700 in the scan interval, indicating that the first record is in the high interval.

Modify the lower boundary value, low = mid = 6, enter the high interval.

In the third round, calculate the middle position mid = (low + high) / 2 = (6 + 9) / 2 = 7, resulting in a low interval (6 ~ 7), a high interval (7 ~ 9).

The middle position corresponds to slot 7, and its maximum record id = 707, which is greater than the left endpoint value of 700 in the scan interval, indicating that the first record is in the low interval.

Modify the upper boundary value, up = mid = 7, in this case, high – low = 7 – 6 = 1, the loop condition up – low > 1 is not met, the cycle ends.

The value of the left endpoint of the scan interval of 700, which is greater than the id (606) of the largest record in slot 6 and less than the id (707) of the largest record of slot 7, indicates that the first record belongs to the jurisdiction of slot 7 (in this case, slot 7 is the high slot).

Next, it’s time to find the location of the first record in the slot.

Sequential lookup

At the end of the dichotomous lookup, low = 6 (slot 6) with id = 606 for its largest record, and high = 7 (slot 7) with id = 707 for its largest record.

During the dichotomous lookup, the first record has been determined to be in the range of slot 7, so during the sequential lookup, it is not necessary to read the record id = 606 (the last record of slot 6), but to start with the next record of this record, the first record of slot 7.

In round 1, read the next record with id = 606, get a record with id = 666, 666 is less than the left endpoint value of 700 in the scan interval, and the next record needs to be read for comparison.

In round 2, read the next record with id = 666 and get a record with id = 688, 688 is less than the value of 700 at the left endpoint of the scan interval, and continue reading the next record.

In round 3, read the next record with id = 688 and get a record with id = 700, 700 equals the value of the left endpoint of the scan interval of 700, which satisfies the id >= 700 condition.

At this point, after 3 rounds of comparison, the first record of the scan interval [700, +>) corresponding to id ∞= 700 has been found, and the sequential lookup process of leaf nodes has ended, and the entire process of locating the first record of the scan interval has also ended.

The previous introduction to the dichotomous method to find the locating slot and the sequential search for the location of the locating record involves comparing the scanned interval field values with the index field values, but we have not further introduced the process of comparison.

If it is just a regular comparison, it is nothing more than the fields of the cyclic scan interval, and the corresponding fields in the index are compared one by one, and there is no need to say anything more.

However, InnoDB optimizes the comparison process to avoid repeated comparisons as much as possible for the fields that have been compared, the parts in front of the fields, and thus improve the execution efficiency of the dichotomous and sequential lookup processes to improve performance.

InnoDB is a step closer to the optimization of leaf nodes than root nodes and inner nodes, and we introduce the optimization of dichotomous and sequential search for root nodes & inner nodes and leaf nodes in two sections.

Based on the example data of the slot in the index page of the figure above, we analyze the process of locating the slot in which slot the left endpoints of the scan interval 160, 44 (using this to represent the first record of the scan interval) are located in the query criteria i1 >> = 160 and i2 = 160 and i2 = 44 as examples.

Initially, the upper and lower boundaries of the dichotomous lookup are: low = 0, high = 13.

Dichotomy lookup

In round 1, calculate the middle position mid = (low + high) / 2 = (0 + 13) / 2 = 6, resulting in low interval (low ~ mid = > 0 ~ 6), high interval (mid ~ high = > 6 ~ 13).

The middle position corresponds to slot 6, whose maximum record is i1 = 160, i2 = 33, and compares the i1, i2 field values for the left endpoint of the scan interval and the largest record of slot 6 one by one to determine whether the left end of the scan interval is in the low interval or the high interval.

Comparing the i1 field values first, the i1 field value at the left end of the scan interval and the i1 field value in the index are equal to 160.

The value of the i2 field is then compared, and the value of the i2 field at the left end of the scan interval (44) is greater than the value of the i2 field in the index record (33), indicating that the value of the left endpoint of the scan interval 160, 44 is in the high interval (slot 6 ~ 13).

Modify the lower boundary value, low = mid = 6, enter the high interval.

In the second round, calculate the middle position mid = (low + high) / 2 = (6 + 13) / 2 = 9, resulting in a low interval (6 ~ 9) and a high interval (9 ~ 13).

The middle position corresponds to slot 9, whose maximum record is i1 = 160, i2 = 66, and compares the i1, i2 field values for the left endpoint of the scan interval and the maximum record for slot 9 one by one to determine whether the left endpoint of the scan interval is in the low interval or the high interval.

Comparing the i1 field values first, the i1 field value at the left end of the scan interval and the i1 field value in the index record are equal to 160.

The value of the i2 field is then compared, and the value of the i2 field at the left end of the scan interval (44) is less than the value of the i2 field in the index record (66), indicating that the left endpoint value of the scan interval 160, 44 is in the low interval (slot 6 ~ 9).

Modify the upper boundary value, high = mid = 9, enter the low interval.

In the third round, calculate the middle position mid = (low + high) / 2 = (6 + 9) / 2 = 7, resulting in a low interval (6 ~ 7), a high interval (7 ~ 9).

The middle position corresponds to slot 7, whose maximum recorded i1 = 160 and i2 = 44.

Following the routine of Rounds 1 and 2, it is time to compare the i1 and i2 field values of the largest records of the left endpoint of the scan interval and slot 7 one by one.

But …, the point comes, after the first round of comparison, it is determined that the left endpoint value of the scanning interval 160, 44 is located between slots 6 ~ 13; After the second round of comparison, it was determined that the left endpoint values of the scan interval 160 and 44 were located between slots 6 and 9.

The intersection can be obtained: the left endpoint value of the scan interval is 160, 44 is located between slots 6 and 9.

As you can see from the previous diagram, between slots 6 and 9, the i1 field value for the largest record in each slot is 160, and the i1 field value for the left endpoint of the scan interval is also 160.

Within this range, no matter how many rounds of comparison are to be made, it is quite certain that the value of the i1 field of the record is equal to the value of the field of i1 at the left end of the scan interval.

Since it is already possible to determine that the results of the comparison are equal before the comparison, there is no need to compare the values of the i1 field.

After the dichotomous lookup is over, the subsequent sequential lookup process, also within this range, can also not compare the values of the i1 field.

Well, in this section we’re going to talk about InnoDB’s optimization of the positioning process, the goal has been achieved, and for the above example, the remaining dichotomous lookup and sequential lookup process will not continue to be analyzed.

If a range can be locked in the dichotomous lookup process, the dichotomous lookup and sequential lookup process of the leaf node can skip not only the first N fields that have been compared and equal, but also the first M bytes that have been compared and equal in the N + 1 fields.

However, skipping bytes that have already been compared has some limitations and can only be applied to the following fields:

These types of fields, in the process of dichotomous and sequential lookup, are used in the source code to loop through the field contents and compare them byte-by-byte.

Let’s take a concrete example to illustrate this:

There is a B-TREE index with 2 fields, i1 is the int type, b1 is the blob type, as shown in the following figure:

Suppose the i1 field value of the left endpoint of the scan interval is 160, and the first 1000 bytes of the b1 field value are 0x001 0x002 … 0x999 0x1000。

Again, suppose that after the first 2 rounds of comparison, the left endpoint value of the scan interval has been locked between slot 6 and slot 9, the i1 field value of all records in this interval is 160, and the first 1000 bytes of the b1 field of all records are 0x001 0x002 … 0x999 0x1000。

If you can only skip the i1 field that has been compared during the dichotomous lookup and sequential lookup in the third round and later, for the b1 field, each time you have to start the first byte of comparison, the first 1000 bytes of the byte-by-byte comparison is repeated.

According to the scenario we introduced earlier, within the lock range (slot 6 ~ 9), the value of the i1 field at the left endpoint of the scan interval and the i1 field of all records are equal; The first 1000 bytes of the b1 field are also equal, do not need to be compared, and can be skipped.

Then, in the subsequent comparison and sequential lookup process of dichotomous lookup, only need to start the comparison from the 1001st byte of the b1 field, and can avoid some duplicate comparison operations more.

Before entering the topic of this article, sections 2 and 3 introduce the definition of scan intervals and provide examples of each type of scan interval; Then we introduce the structures in the index page that are more relevant to this article: record linked lists, pseudo records, slots (SLOTS).

Section 4 begins with an abstract description of the process of finding locating slots and sequentially locating slots, and then, using a 2-layer B-TREE index as an example, analyzes in detail each step of the binophometry locator and sequential locator slots.

Section 5 describes InnoDB In order to reduce the number of comparisons in the process of binary locating slots and sequential locating slots, after locking a range, for root nodes, inner nodes, it is possible to skip fields that have been compared and confirmed as equal; For leaf nodes, in addition to skipping the field, you can skip the previous few bytes in the field that have been compared and confirmed to be equal.