Lambda vs Kappa — Comparison#

Lambda runs batch + stream in parallel and merges results — accurate but operationally expensive (two codebases). Kappa runs stream only and replays history when needed — simpler but requires your stream processor to handle replay at scale. The choice comes down to whether batch-level accuracy is a hard requirement, or whether operational simplicity is worth more.

Side by Side#

	Lambda	Kappa
Pipelines	Two (batch + stream)	One (stream only)
Codebases	Two	One
Consistency	Batch and stream can drift	Single pipeline, always consistent
Accuracy	Batch is exact, stream is approximate	One logic, consistent accuracy
Latency	Low (speed layer) + delayed correction (batch)	Low (stream handles both)
Historical reprocessing	Spark reruns on S3	Replay Kafka/S3 through stream processor
Operational complexity	High — maintain two pipelines	Low — one pipeline
Storage requirement	S3 + Kafka	S3 (source of truth) + Kafka
Use when	Batch accuracy non-negotiable	Simplicity matters, stream can handle replay

Decision Rule#

Use Lambda when: - Regulatory or financial accuracy is required (billing, compliance, tax) - Batch and stream results need to be independently verifiable - Your stream processor cannot replay years of history at scale

Use Kappa when: - You want one codebase and one pipeline to maintain - S3 is your source of truth and you can replay through Kafka - Your stream processor (Flink, Kafka Streams) is capable enough for both live and replay

Interview Answer Template#

"For the real-time dashboard I'd use a stream processor — Flink consuming from Kafka, results updated every second. For the monthly billing report I need exact numbers, so I'd run a nightly Spark batch job against the raw event log in S3. This is Lambda architecture — speed layer for low latency, batch layer for accuracy, serving layer merges both. If operational simplicity is a priority and we're okay with Kappa, we could drop the batch pipeline and replay historical events from S3 through the same stream processor using a new consumer group from offset 0."

Where These Appear in Interviews#

System	Architecture
Ad Click Aggregation	Lambda — real-time approximate counts + nightly exact reconciliation
Billing / Payments	Lambda — batch for exact invoicing, stream for live alerts
News Feed Analytics	Kappa — stream handles both live feed and historical replay
Fraud Detection	Kappa — single stream processor, replay to retrain on historical patterns
Log Analytics	Either — depends on accuracy requirements