> Kafka’s storage mechanism
  • two, reliability guarantee
    • 1, AR
    • 2
    • . Producer reliability level 3, leader election 4, Kafka reliability guarantee

> 1. Kafka’s storage mechanism Kafka

stores data in topics through topics, and there are partitions in the topic, which can have multiple replicas, and the internal division is also subdivided into several segments.

The so-called partition is actually a folder created under the corresponding storage directory of kafka, and the name of the folder is the subject name plus the partition number, and the number starts from 0.

1. The so-called segment of the segment

is actually the file generated under the folder corresponding to the partition.

A partition will be divided into

several segments of equal size, which on the one hand ensures that the data of the partition is divided into multiple files to ensure that no oversized files will be produced; On the other hand, historical data can be deleted based on these segment files to improve efficiency.

A segment in turn consists of a .log and an .index file.

1. .log .log


is used as a data file to store data segmentation.


saves index information for the

corresponding .log file for the index file.

In the .index file, the index information of the corresponding .log file is saved, and the start position of each offset

stored in the .log current segment can be learned by finding the .index file, and each log has its own fixed format, including the offset number, log length, key length and other relevant information, through the data in this fixed format can determine the end position of the current offset. The data is thus read.

3. Naming rules The naming rules

of these two files are:

the first segment of the

partition global starts from 0, and the subsequent segment file name is the offset value of the last message of the previous segment file, the value size is 64 bits, the length of 20 numeric characters, and no number is filled with 0.

2. When reading data and starting to read the data corresponding to an offset in the specified partition, first compare the names of the offset and all segments of the current partition to determine which segment the

data is in, and then find the index file of the segment.

Determine the starting position of the current offset in the data file, and finally read the data file from that position, and judge the result according to the data format to obtain the complete data.

> 2. Reliability Guarantee

1. AR maintains an AR

list in Kafka, including replicas of all partitions. AR is further divided into ISR and OSR.


AR, ISR, OSR, LEO, HW are all saved in Zookeeper.

1. The replicas in the

ISRISR must synchronize the data in the leader, and only after the data is synchronized is it considered to be successfully submitted, and it can only be accessed by the outside world after successful submission.

In this synchronization process, the

data cannot be accessed by the outside world even if it has been written, and this process is achieved through the LEO-HW mechanism.

2. Whether the copy in OSR OSR synchronizes the leader’s data does not affect the submission of data, and the follower in

OSRtries its best to synchronize the leader, and the data version may be backward.

Initially, all replicas are in the ISR, and during kafka’s work, if a replica synchronizes slower than the threshold specified by replica.lag.time.max.ms, it is kicked out of the ISR and deposited in the OSR, and can be returned to the ISR if the speed is subsequently restored.

3.LEO LogEndOffset: The offset of the latest data of the partition, when the data is written to the leader, the

LEOimmediately executes the latest data.

Equivalent to the latest data identification bit.

4.HW HighWatermark: Only after the written data is synchronized to all ISR copies, the data is considered committed, HW is updated to the location, and the data before

HWcan be accessed by consumers, ensuring that data that has not been synchronized will not be accessed by consumers.

Equivalent to all replica synchronization data identification bits.

After the leader is down, only a new leader can be selected from the ISR

list, no matter which copy of the ISR is selected as the new leader, it knows the data before HW, which can ensure that after switching the leader, consumers can continue to see the data submitted by HW before.

Therefore, LEO represents the latest data location that has been written, while HW represents the data

that has been synchronized, and only the data before HW can be accessed by the outside world.

5. HW truncation mechanism

If the leader is down, a new leader is selected, and the new leader does not guarantee that all the data of the

previous leader has been completely synchronized, only that the data before HW is synchronized, at this time all followers must truncate the data to the position of HW, and then synchronize the data with the new leader to ensure that the data is consistent.

When the downed leader

recovers and finds that the data in the new leader is inconsistent with the data held by him, the downed leader will truncate his data to the hw position before the outage, and then synchronize the data of the new leader. The downed leader also synchronizes the data like a follower to ensure data consistency.

2. The reliability level

of the producer can already ensure the reliability of the Kafka cluster through the

above explanation, but when the producer sends it to the Kafka cluster, the data is transmitted through the network, which is also unreliable, and the data may be lost due to network delay, flash interruption and other reasons.

KAFKA offers producers the following three levels of reliability, with different strategies to ensure different reliability guarantees.

In fact, this policy configures the timing when the leader will successfully receive message information and respond to the client.

Configure through the request.required.acks parameter

: 1: The producer sends data to the leader, the leader sends a success message after receiving the data, the producer

considers the data sent successfully after receiving it, if the success message has not been received, the producer believes that the data sending failure will automatically resend the data.

When the leader goes down, data may be lost.

0: The producer keeps sending data to the leader without requiring the leader to feedback a success message.

This mode has the highest efficiency and the lowest reliability. Data may be lost during sending, or it may be lost when the leader is down.

-1: The producer sends data to the leader, and after receiving the data, the leader waits until all replicas in the ISR list are synchronized with the data before sending a success message to the producer.

This mode is highly reliable, but when only the leader is left in the ISR list, it is possible to lose data when the leader is down.

At this time,

min.insync.replicas can be configured to specify that at least a specified number of copies must be observed in the ISR, the default value is 1, and it needs to be changed to a value greater than or equal to 2

, so that when the producer sends data to the leader

but finds that only the leader himself in the ISR, an exception will be received indicating that the data has failed to write, and the data cannot be written at this time, ensuring that the data is absolutely not lost.

Although it is not lost, it may generate redundant data, such as the producer sends data to the leader, the leader synchronizes the data to the

follower in the ISR, and the synchronization is half of the leader down, at this time the new leader is selected, which may have part of the submitted data, and the producer receives a failure message to resend the data, and the new leader accepts the data and the data is duplicated.

3. Leader election

When the leader is down,

one of the followers in the ISR will be selected to become the new leader, what if all copies in the ISR are down?

The following configuration resolves this issue



Policy 1: You must wait for the copy in the ISR list to come alive before selecting it as leader to continue working.


Strategy 2: Select any of the live replicas to become the leader and continue working, this follower may not be in the ISR.

Strategy 1, reliability is guaranteed, but the availability is low, and only Kafka can be restored if the leader is finally hung up alive.

Strategy 2, high availability, reliability is not guaranteed, any copy can continue to work after it is alive, but there may be data inconsistencies.

4, kafka reliability guarantee

At most once: messages may be lost, but never transmitted repeatedly.

At least once: Messages are never lost, but they may be transmitted repeatedly.

Exactly once: Each message is definitely transmitted once and only once.

kafka guarantees at least once, can be guaranteed not to lose, but may be duplicated, in order to solve the duplication need to introduce a unique identification and deduplication mechanism, kafka provides GUID to implement unique identification, but does not provide its own deduplication mechanism, developers need to deduplicate themselves based on business rules.



public number (zhisheng) reply to Face, ClickHouse, ES, Flink, Spring, Java, Kafka, Monitoring < keywords such as span class="js_darkmode__148"> to view more articles corresponding to keywords.

like + Looking, less bugs 👇