Statefulness
What "Stateful" Means#
Some computations need memory across events.
Stateless: each event is processed independently.
Stateful: you need to remember things from previous events.
Example: "Flag if same card makes 5 transactions in 10 seconds."
A single transaction event tells you nothing. You need to remember how many transactions this card has made recently. That's state.
Processor Nodes vs Kafka Brokers#
These are completely separate machines:
Kafka Brokers Processor Nodes (Flink / Kafka Streams)
────────────── ──────────────────────────────────────
Store events Consume and compute
Broker 1, 2, 3 Node 1, 2, 3
Serve partitions Read from partitions
Processor nodes are your application instances — not Kafka. They consume from Kafka, run your logic, and maintain state.
Key-Based Partitioning: The Prerequisite#
Kafka assigns events to partitions using:
If you set key = card_id:
Every event from card-456 always lands on Partition 2, regardless of when it arrives.
Since each partition is assigned to one processor node:
Partition 2 → Node 2
card-456, tx1 → Partition 2 → Node 2
card-456, tx2 → Partition 2 → Node 2
card-456, tx3 → Partition 2 → Node 2
Node 2 sees all events for card-456. So it can maintain a correct local counter.
Local State: No Network Hop#
Instead of calling Redis or a DB on every event, the processor keeps state in its own memory:
No network call. No latency. Just a local hashmap.
At millions of events per second, this is the difference between feasible and impossible.
The Round Robin Bug#
If you produce events without a key, Kafka uses round robin:
card-456, tx1 → Partition 0 → Node 1 (sees 1 tx)
card-456, tx2 → Partition 2 → Node 3 (sees 1 tx)
card-456, tx3 → Partition 1 → Node 2 (sees 1 tx)
Each node thinks card-456 has made only 1 transaction. Fraud detection never fires.
This is a common production bug. Someone removes the key to "distribute load more evenly" and stateful logic silently breaks.
// WRONG — round robin, stateful processing broken
producer.send(topic="transactions", value=event)
// CORRECT — key-based routing, same card always same node
producer.send(topic="transactions", key=card_id, value=event)
Rule: stateful stream processing requires key-based partitioning. It is not optional.