Many large systems receive orders, clicks, logs, and database changes from dozens of services, and several downstream teams need the same events: fraud detection, analytics, search indexing, billing, monitoring, and customer notifications. Each team processes at a different pace, and some need to replay old events after a bug fix.
This is where Apache Kafka is usually a serious candidate. Kafka is a distributed event log: producers append records, brokers store replicated partitions, and consumers read by offset. The important interview question is not "can Kafka queue messages?" It is whether you need a retained, replayable, high-throughput event stream.
Kafka is not a traditional work queue. It keeps records for a configured retention window, lets multiple consumer groups read independently, and preserves order within a partition. Those properties are its main strengths, and also what make it a poor fit for simple job queues, request-reply workflows, or complex broker-side routing.
This chapter covers the practical interview pieces: topics, partitions, keys, consumer groups, replication, producer durability settings, retention, replay, delivery guarantees, and when to choose something simpler.