Partitioners
The Producer is the application that sends data to Kafka. Its most critical job is deciding which partition to send each message to. This is handled by the Partitioner.
The Producer's Choice: Pick a Lane#
A Topic is split into multiple Partitions (lanes). Before sending a message, the Producer must choose a lane.
1. Round-Robin (No Key)#
If you don't care about the order of events, you can just spread the data evenly across all partitions. - Message 1 → Partition 0 - Message 2 → Partition 1 - Message 3 → Partition 2 - Message 4 → Partition 0...
Pros: Perfectly balances the load across all brokers. Cons: Ruining the order. If "Message 1" and "Message 4" are for the same user, they might be processed out of order if Partition 0 is slower than Partition 1.
2. Key-based Hashing (The Partitioner)#
If you need ordering, you provide a Key (like user_id or patient_id).
The Producer runs the key through a Hashing Function: hash(user_123) % 3 (total partitions) = Partition 1
No matter how many millions of messages you send, if the key is user_123, the result will always be Partition 1.
Producer → {Key: "Nike", Val: "Click"} → Partitioner → [Partition 1]
Producer → {Key: "Adidas", Val: "Click"} → Partitioner → [Partition 0]
Producer → {Key: "Nike", Val: "Click"} → Partitioner → [Partition 1] (Always!)
Why this matters: The Guarantee#
Kafka only guarantees ordering within a single partition. By using a Key, you "lock" all events for a specific entity (like a customer or a patient) into the same lane.
Example: The Hospital Spike - 10:01 AM: Heart Spike (Key: Patient_A) → Partition 1 - 10:02 AM: Heart Spike (Key: Patient_A) → Partition 1
Because they are in the same lane, Partition 1 will always deliver the 10:01 AM event to the doctor before the 10:02 AM event.
Common Trap: If you don't provide a key, you lose ordering. In an interview, if the system requires strict event ordering for a user (like a bank transaction or a chat message), you must mention using the user_id as the partition key.
Interview framing: "I'd use the advertiser_id as the partition key for the ad-click topic. This ensures that all clicks for a single advertiser land in the same partition in the exact order they occurred. This is critical for our Billing service to calculate charges accurately without handling out-of-order data."