Kafka is an event management system.
Why did the concept of topic have to come about?
Because there are all kinds of events. You have to be able to organize them to work with them, to build your system, to optimize your business, to make more money.
Topic is where similar kinds of events sit.
Topic is not queue
Topic is a data structure whose semantics are unlike a queue.
Queue is dynamic. It loses participants from the front, and gets participants added to the back. Topic gets events added to the back but does not get events removed from the front.
Messaging systems are typically perceived as queues. Kafka should not be perceived so.
Topic has the semantics of a log
Log/topic is append only
Events get appended to the end always in the same way that a log file is. Logs are always written at the end. You never update log statements in the middle of the file
Events/log statements are immutable
The log statements that are written in a log are never edited. They are sacrosanct. Events in the topic is the same. If something has happened it has. If you have to undo the event, you need to create a second event that in effect undoes the event.
Topic/log are not indexable
You cannot index into a particular log statement within a log file. Similarly, you cannot directly index into a particular event in the topic.
Topic/log are read by scanning from an offset
To read a log file, you typically go to the general portion where you think the log statement should be in, and then you read line by line downwards. Similarly, topic events are referenced starting with an offset and then going one by one after.
Kafka events live forever
Kafka events are unlike messages. They can live forever. They can outlive the server machines where the events first made their appearance. Bare metal servers got burned in the fire, and the kafka system made the replica running on another building as the primary, the events are still alive there.
Events can be made to expire after an age. The age can be set to ever, or years, or weeks as the use case demands.
How kafka is different from traditional messaging systems
As a developer, it is likely that one would look at kafka as he has looked at a JMS or an IBM MQ. This would throw him off as he works on Kafka.
- Kafka is not a queue unlike JMS
- Events read from Kafka do not get removed unlike messages that get consumed on read in JMS/MQ.
- Since the listener reading an event makes the event disappear, you used to only have one listener for one queue in the traditional messaging systems typically. Kafka is very generous in terms of the number of consumers allowed on a topic. This would drive a programming model that is fundamentally different from the traditional messaging system based programs.
- Durability Kafka stores the events in the file system, and in that sense the events are way more durable than JMS/MQ which can lose messages on a server crash.
- Kafka has the advantage of immutability. Since the events are immutable, there is no state to think about in terms of the event. This makes it easy to replicate the topic.
- Immutability makes Kafka very performant.
- Simplicity of log as a data structure makes topics easy to work with.
Topics to store events
- Topics would store events of a certain type.
- You could also have related topics that store events that have been transformed/embellished. For instance, if you have a topic that stores the ID of a student and his score, you could in addition have a related topic that finds the name against the ID and stores the name against the score.
- You can have related topics that have events filtered from the original topic. For instance, if you have a topic that stores the test scores of all students in the class. You could have a supplemental topic which just has scores that are above a certain percentile.