Interview Cheatsheet — OLTP vs OLAP#
What is OLTP and when do you use it?
Success
OLTP (Online Transaction Processing) is your production database — the one serving live users. Every operation is small and fast: insert a row, fetch a user by ID, update a column. Thousands of these happen per second, each touching very few rows. PostgreSQL, MySQL, and DynamoDB are OLTP databases. Row-oriented storage makes single-row reads fast.
What is OLAP and when do you use it?
Success
OLAP (Online Analytical Processing) is a separate database built for heavy analytics — full table scans, aggregations, GROUP BY across billions of rows. It serves analysts and dashboards, never live users. BigQuery, Redshift, and Snowflake are OLAP databases. Column-oriented storage means only the queried columns are read off disk, making scans dramatically faster.
Why can't you run analytics on your production database?
Success
A full table scan at scale — say, COUNT across 500M rows — consumes disk I/O and CPU for minutes. It competes with live user queries for the same resources. The production database slows down, users experience latency spikes and timeouts. Analytics must run on a separate OLAP warehouse isolated from live traffic.
Interview framing
"Running analytics on the production OLTP database is a classic mistake. At scale, a single GROUP BY query can cause a full table scan that starves live user traffic. The right pattern is to replicate data into a separate OLAP warehouse — analysts query that, production is never touched."
Why is column-oriented storage faster for analytics?
Success
Analytics queries typically touch 2–3 columns out of 20+. In row-oriented storage, reading any column means loading the entire row off disk — including all the columns you don't need. At 500M rows, that's gigabytes of wasted I/O. Column-oriented storage keeps all values of a column together, so a query that only needs country reads only the country column — a fraction of the data.
How do you keep the OLAP warehouse up to date?
Success
Two options: ETL and CDC.
ETL — a batch job runs every few hours, extracts changed rows from OLTP, transforms them, and loads them into the warehouse. Simple but data is hours stale.
CDC — Change Data Capture reads the database write log (WAL/binlog), publishes every change to Kafka, and a consumer writes it into the warehouse in near real-time (seconds of lag). More complex but dramatically fresher.
Interview framing
"For real-time use cases like fraud detection, CDC via Kafka gives seconds of lag. For batch reporting and BI dashboards, ETL every few hours is simpler and sufficient."
When would you mention OLAP in an interview?
Success
Any time the interviewer asks about analytics, reporting, or dashboards. The pattern is always: production OLTP DB → CDC or ETL pipeline → OLAP warehouse → analytics queries run there. Never say "we'll run reports against the main database."
Quick Reference#
| OLTP | OLAP | |
|---|---|---|
| Purpose | Serve live users | Serve analysts |
| Operations | Small, fast, single-row | Large scans, aggregations |
| Storage | Row-oriented | Column-oriented |
| Examples | PostgreSQL, MySQL, DynamoDB | BigQuery, Redshift, Snowflake |
| Latency | Milliseconds | Seconds to minutes |