Kafka: Consumer

Learn
3 min readJul 12, 2024

--

What does the kafka consumer consume?
Events

From where?
From Kafka topic(s)

From where?
From a kafka cluster.

What does consume mean?
Process/business logic the event(s).

Is the consumer continually connected to the topic?
Yes. It subscribes to the topic, and new messages posted in the topic would get consumed by the consumer real time.

Can a consumer tie to multiple topics?
Yes, it can.

Why would a consumer want to tie to multiple topics?
Suppose you are tracking immigration. There is a topic per emigrant country. Indian, Pakistanis, Chinese, Koreans etc. There can be specific consumers for these to capture the number of adult indians, and children Indians come in. There can be a more general consumer who only cares about number of adults versus number of children, without caring about race. So it can listen to all these topics simultaneously tracking the children count and adults count.

ConsumerRecords
This is plural, and represents the multiple messages that can come into the consumer. Consumer now needs to process these consumer records one by one. Each one would be a ConsumerRecord. The consumer record would be the event, the key value pair.

Consumers and scalability
In Kafka, multiple producers can be writing into one topic. If there is only one consumer instance, then clearly the consumer instance would fall behind. There would be a big backlog of events that is pending processing from the consumer side. Think of the pile of files on the desk in the government office.

Consumer group
That is where Kafka brings in consumer groups. There would be a group of consumer instances that would work on the files at the desk. The scaling of the consumer group is automatically handled by Kafka. Kafka automatically distributes the load of the events between the consumer instances.

Distribution of the messages between the consumer instances in a group
The distribution of the messages between the consumer instances would get done in a logical manner by negotiation between the consumer instances and the broker instances that are hosting the topics that the consumer group is interested in.
For simplicity, if we assume there is only one topic with four partitions, and there are two instances in the consumer group, each consumer instance is assigned two partitions, and messages from those particular partitions are published to those particular consumer group instances.

Rebalancing of load
If an instance of the consumer dies from the consumer group, automatically the partitions are reallotted between the instances that are running.

Automatic scaling by Kafka
Kafka automatically scales the consumer group when there are too many messages coming in. More consumer instances are created and the load of the events is redistributed. All this magic happens behind the scenes. Developer is insulated from all these botherations.

Number of consumer instances greater than number of partitions
If the number of partitions that a consumer group is subscribing to is less than the number of consumer instances, then the extra consumer instances are basically idle. That is, if there are ten partitions, and there are fifteen consumer instances, five instances would remain idle as there are no partitions to assign to for these.

Ordering and scaling together
Ordering of messages within a partition against a key is guaranteed. This guarantee continues to hold even when there is scaling. This is something that the traditional messaging systems did not offer. When there is scaling, the ordering is lost. Where ordering is important, the number of instances had to be carefully set to one to ensure that ordering is in place.

--

--

Learn
Learn

Written by Learn

On a continuing learning journey..

No responses yet