In the previous article, we talked in detail about the underlying principle design ideas and

details of Kafka Producer, and in this article we mainly talked about the design ideas of Kafka Consumer, the internal underlying principle of consumers.


In Kafka, we Calling the party consuming the message the consumer is one of the core components of Kafka. Its main function is to consume and process the messages produced by Producer to complete the consumption task. So how are the messages generated by these producers consumed by consumers? What kind of consumption mode is consumption based on, what are the partition allocation strategies, how are consumer groups and rebalancing mechanisms handled, how offsets are submitted and stored, how are consumption progress monitored, and how to ensure that consumption processing is completed? Let’s explain each of them next.


We know that message queues generally have two implementations, (1) Push (push mode) (2) Pull (pull mode), so what kind of way does Kafka Consumer use for consumption? In fact, Kafka Consumer adopts the pull mode of actively pulling broker data for consumption. These two methods have their own advantages and disadvantages, let’s analyze them:

1) Why not use the Push mode? If you choose the push mode, the biggest disadvantage is that the broker does not know the consumption speed of the consumer, and the push rate is controlled by the broker, which is easy to cause the accumulation of messages, if the task operation performed in the consumer is more time-consuming, then the consumer will be very slow, and the serious situation may cause the system to crash.

2) Why use Pull mode? If you choose Pull mode, the Consumer can pull data according to its own situation and state, and can also perform delayed processing. But thepmutll pattern also has shortcomings, and how does Kafka solve this problem? If there is no message from Kafka Broker, then every time the Consumer pulls empty data, it may always loop back empty data. In response to this problem, the Consumer consumes data every time it calls Poll(), with a timeout parameter, and when empty data is returned, it will block in Long Polling and wait for timeout Consume again until the data arrives.   


After talking about consumer consumption methods, advantages and disadvantages, and how Kafka weighs and solves the shortcomings, let’s talk about what consumer initialization has done

Kafka consumer initialization code:

      As you can see from the code,
there are 4 steps to initialize Consumer

  1. construct a Propertity object for consumer-related configuration;

  2. Create an object for KafkaConsumer Consumer;

  3. Subscribe to the corresponding list of topics;

  4. Call Consumer’s poll() method to pull the subscribed message

Kafka consumer consumption flow diagram is as follows



Consumer Group mechanism

After talking about the initialization process of Consumer, let’s talk about the Consumer Group mechanism Why did Kafka design Consumer Group, only Consumer can’t? We know that Kafka is a high-throughput, low-latency, high-concurrency, and highly scalable message queuing product, so if a topic has millions to tens of millions of data, relying only on the consumer process consumption, the consumption speed can be imagined, so a more scalable mechanism is needed to ensure the consumption progress, and this is when the Consumer Group came into being Consumer Group is a scalable and fault-tolerant consumer mechanism provided by Kafka.

Kafka Consumer Group features the following:

  1. each consumer Each

  2. Consumer Group has a public and unique Group ID

  3. When the Consumer Group consumes a topic, each partition of the topic can only be assigned to a certain consumer in the group, as long as it is consumed once by any consumer, then this data can be considered to be successfully consumed by the current consumer group


Partition allocation policy mechanism

We know that there are multiple consumers in a consumer group, and a topic also has multiple partitions, so it will inevitably involve the allocation of partitions: the problem of determining which partition is consumed by which consumer.

The Kafka client provides 3 partition allocation strategies: RangeAssignor, RoundRobinAssignor, and StickyAssignor, the first two of which are relatively simple The StickyAssignor distribution scheme is a bit more complex.


    RangeAssignor is Kafka’s default partition allocation algorithm, which is allocated according to the dimension of the topic, for each topic, first sort the partition by partition ID, and then subscribe to the consumer group of the topic The consumers are then sorted and then the partitions are assigned to consumers as evenly as possible by range segments. This may cause the Consumer process that allocates partitions first to be overburdened (the number of partitions is not divisible by the number of consumers).

The partition allocation scenario analysis is shown in the following figure (multiple consumers under the same consumer group):

          Conclusion: The obvious problem with this distribution method is that as the number of topics subscribed to by consumers increases, the imbalance will become more and more serious.


    RoundRobinAssignor’s partition allocation strategy is to sort the partitions of all topics subscribed to in the consumer group and allocate them one by one in order as evenly as possible 。 If each Consumer subscription in the Consumer Group subscribes to the same topic, the distribution result is balanced. If the subscription topics are different, then the distribution result is not guaranteed to be “as balanced as possible”, because some consumers may not participate in the allocation of some topics.

The partition allocation scenario analysis is shown in the following figure


1) When the topic of each consumer subscription in the group is the same:

2) When the topic of each subscription in the group is different, this may cause the skew of partitioned subscriptions:


      StickyAssignor partition allocation algorithm is the most complex allocation strategy provided by Kafka Java client, which can be set through the partition.assignment.strategy parameter, introduced from version 0.11, the purpose is to perform new allocation, try to make as few adjustments as possible in the previous allocation result, which mainly achieves the following 2 goals:

1) The distribution of Topic Partition should be as balanced as possible.

2) When Rebalance occurs, try to be consistent with the previous allocation result.

Note: When two goals conflict, give priority to the first goal, which can make the distribution more even, where the first goal is tried to complete as much as possible by all 3 allocation strategies, and the second goal is the essence of the algorithm.

Let’s talk about the difference between RoundRobinAssignor and StickyAssignor as an example.

The partition allocation scenario analysis is shown in the following figure:

1) The topic of each consumer subscription in the group is the same RoundRobinAssignor is assigned the same as StickyAssignor:

        When the above situation occurs in a Rebalance situation, the distribution may be different, if C1 fails offline



And StickyAssignor:

         Conclusion: As can be seen from the results after Rebalance above, although both allocation strategies are evenly distributed in the end, RoundRoubinAssignor It’s completely redistributed over and over again, and StickyAssignor It is to achieve a uniform state on the basis of the original.

2) When the topic of each consumer subscription in the group is different:



      When the above situation occurs in a Rebalance situation, the distribution may be different, if C1 fails offline: RoundRobinAssignor:


     As you can see from the above results, RoundRoubin’s allocation strategy caused a severe allocation skew after Rebalance. Therefore, if you want to reduce the overhead caused by reallocation in a production environment, you can choose StickySignor’s partition allocation strategy.     


After talking about consumer groups and partition allocation strategies above, let’s talk about the Rebalance mechanism in the consumer group, for the consumer group, there may be consumers joining or exiting at any time, then the change of the consumer list will definitely cause the reallocation of partitions. We call this allocation process Consumer Rebalance, but this allocation process requires the help of the coordinator component on the broker side to complete the partition reallocation of the entire consumer group with the help of the coordinator It is also triggered by listening to ZooKeeper’s /admin/reassign_partitions node.


Rebalance triggers and notifications

There are three trigger conditions for Rebalance:

  1. when the number of consumer group members changes ( Active joining or active leaving group, failure offline, etc.)

  2. When the number of subscribed topics

  3. changes

  4. and the number of partitions subscribed to a topic changes,

how does Rebalance notify other consumer processes?

Rebalance’s notification mechanism relies on the heartbeat thread on the consumer side, which periodically sends heartbeat requests to the coordinator on the broker side, and when the coordinator decides to enable Rebalance, it will encapsulate the “REBALANCE_IN_PROGRESS”

The response to the

heartbeat request is sent to the consumer, and when the consumer finds that the heartbeat response contains “REBALANCE_IN_PROGRESS”, it knows that Rebalance begins.




In fact, Rebalance is essentially a set of protocols. The Consumer Group works with the Coordinator to use it to complete the Consumer Group’s Rebalance. Let’s take a look at what these 5 protocols are and what functions they accomplish:

    Heartbeat request:

  1. Consumer needs to be given to the coordinator on a regular basis Send a heartbeat to prove you’re alive.

  2. LeaveGroup Request: Proactively tell the Coordinator to leave

  3. Consumer Group

  4. SyncGroup Request: Group Leader Consumer Tell all members of the group the allocation plan JoinGroup Request

  5. : The member requests to join the group

  6. DescribeGroup Request: Displays all the information of the group, including member information, agreement name, assignment scheme, subscription information, etc. Usually the request is for the administrator.

    The Coordinator mainly uses the first four requests when rebalanced