Final Design — Netflix#

Final Netflix architecture — all deep dive decisions reflected.

Every component here was justified through an interview session. Nothing is added speculatively.

What Changed from Base Architecture#

The base architecture had a single app server returning a manifest URL, with the client fetching chunks directly from S3. That breaks at scale in three ways: S3 cannot serve 500 Tbps of global traffic, a single server is a SPOF, and there is no failure isolation between services.

Every deep dive added one layer of the final design:

Deep Dive	What it added
Transcoding	Kafka-driven pipeline, S3 chunk storage at multiple resolutions
Manifest + HLS	Manifest file with CDN URLs, client-side ABR quality switching
Caching	CDN edge layer, push for hot releases, pull + LRU/TTL for catalogue
DB	PostgreSQL for content metadata and watch history, Cassandra for resume positions
Peak Traffic	BFF pre-scaling, Redis genre cache, double-checked locking on cache miss
Fault Isolation	Circuit breakers + bulkheads on BFF fan-out, load shedding on Redis failure, adaptive bitrate as CDN cascade prevention
Search	Dedicated Search Service, Elasticsearch with inverted index + fuzzy matching, CDC sync from PostgreSQL via Debezium
Resume Playback	Completion threshold logic in Stream Service, last-write-wins resolution in Cassandra

Full Architecture Diagram#

flowchart TD
    subgraph Clients
        MC[Mobile Client]
        TC[TV Client]
        WC[Web Client]
    end

    subgraph API Layer
        APIGW[API Gateway]
        BFF[BFF Service]
        SS[Stream Service]
        SRCH[Search Service]
    end

    subgraph Genre Services
        AS[Action Service]
        CS[Comedy Service]
        CWS[Continue Watching Service]
        NRS[New Releases Service]
        GN[20 genre services]
    end

    subgraph Cache Layer
        REDIS[(Redis — genre rows)]
    end

    subgraph Databases
        PG[(PostgreSQL — content metadata + watch history)]
        CASS[(Cassandra — resume positions)]
    end

    subgraph Search Layer
        ES[(Elasticsearch)]
        DEBEZIUM[CDC — Debezium]
        ESWRK[ES Sync Worker]
    end

    subgraph Content Pipeline
        UPLOAD[Upload Service]
        KAFKA[Kafka]
        TW[Transcoding Workers]
        S3[(S3 — 64 PB video chunks)]
        MANIFEST[Manifest Generator]
    end

    subgraph CDN Layer
        CDN_PUSH[CDN Pre-warm]
        CDN[Global CDN]
    end

    subgraph Observability
        PROM[Prometheus]
        TELEM[Telemetry Service]
        GRAF[Grafana]
        PD[PagerDuty]
    end

    MC --> APIGW
    TC --> APIGW
    WC --> APIGW

    APIGW --> BFF
    APIGW --> SS
    APIGW --> SRCH

    BFF -->|fan-out with bulkheads| AS
    BFF -->|fan-out with bulkheads| CS
    BFF -->|fan-out with bulkheads| CWS
    BFF -->|fan-out with bulkheads| NRS
    BFF -->|fan-out with bulkheads| GN

    AS -->|check cache first| REDIS
    CS --> REDIS
    CWS --> REDIS
    NRS --> REDIS
    REDIS -->|double-checked lock miss only| PG

    SS -->|read resume position| CASS
    CASS -->|position_seconds| SS
    SS -->|manifest URL + resume position| MC
    MC -->|POST /progress every 4s| APIGW

    SRCH -->|query with fuzziness + boost| ES

    MC -->|fetch manifest| CDN
    MC -->|fetch chunks| CDN
    CDN -->|cache miss pull| S3

    PG -->|write-ahead log| DEBEZIUM
    DEBEZIUM -->|metadata-changes topic| KAFKA
    KAFKA --> ESWRK
    ESWRK -->|index document| ES

    UPLOAD --> KAFKA
    KAFKA --> TW
    TW -->|chunks| S3
    TW --> MANIFEST
    MANIFEST -->|hot release push| CDN_PUSH
    CDN_PUSH --> CDN

    BFF --> PROM
    AS --> PROM
    REDIS --> PROM
    MC -->|TTFF + buffering ratio| TELEM
    TELEM --> PROM
    PROM --> GRAF
    PROM --> PD

Request Flows#

Home Feed#

sequenceDiagram
    participant C as Client
    participant GW as API Gateway
    participant B as BFF
    participant R as Redis
    participant G as Genre Services
    participant DB as PostgreSQL

    C->>GW: GET /api/v1/home?limit=10
    GW->>B: authenticated request
    par fan-out with bulkheads
        B->>R: GET action_row
        B->>R: GET comedy_row
        B->>R: GET continue_watching
    end
    R-->>B: cache hits (most rows)
    B->>G: fetch rows not in cache
    G->>DB: query (double-checked lock)
    DB-->>G: rows
    G-->>R: populate cache
    G-->>B: rows
    B-->>C: { rows: [...], next_cursor: "..." }

Search#

sequenceDiagram
    participant C as Client
    participant GW as API Gateway
    participant S as Search Service
    participant ES as Elasticsearch

    C->>GW: GET /api/v1/search?q=leo&limit=10
    GW->>S: authenticated request
    S->>ES: multi_match query with fuzziness AUTO + boost weights
    ES-->>S: ranked results by relevance score
    S-->>C: { results: [...10 items], next_cursor: "..." }

Stream Start + Resume#

sequenceDiagram
    participant C as Client
    participant GW as API Gateway
    participant SS as Stream Service
    participant CASS as Cassandra
    participant CDN as CDN

    C->>GW: GET /api/v1/stream?movie_id=m_123
    GW->>SS: authenticated request
    SS->>CASS: READ position WHERE user_id=X AND movie_id=m_123
    CASS-->>SS: position_seconds: 1847
    Note over SS: 1847 < 95% of duration — in progress
    SS-->>C: { stream_url: "cdn.netflix.com/...", resume_position_seconds: 1847 }
    C->>CDN: fetch manifest
    CDN-->>C: manifest with chunk URLs
    C->>CDN: fetch chunks from position 1847
    CDN-->>C: video chunks
    Note over C: playback starts at 30:47

Progress Write#

sequenceDiagram
    participant C as Client
    participant GW as API Gateway
    participant SS as Stream Service
    participant CASS as Cassandra

    Note over C: every 4 seconds while watching
    C->>GW: POST /api/v1/stream/progress { movie_id, position_seconds }
    GW->>SS: authenticated write
    SS->>CASS: WRITE user_id + movie_id + position_seconds + timestamp
    SS-->>C: 204 No Content

Content Ingestion + Search Sync#

sequenceDiagram
    participant UP as Upload Service
    participant K as Kafka
    participant TW as Transcoding Worker
    participant S3 as S3
    participant MG as Manifest Generator
    participant CDN as CDN
    participant PG as PostgreSQL
    participant DEB as Debezium CDC
    participant ESW as ES Sync Worker
    participant ES as Elasticsearch

    UP->>K: publish transcoding job
    K->>TW: consume job
    TW->>S3: upload chunks at all resolutions
    TW->>MG: trigger manifest generation
    MG->>S3: write manifest with CDN URLs
    MG->>CDN: push chunks for hot releases
    MG->>PG: INSERT title metadata
    PG->>DEB: write-ahead log change event
    DEB->>K: publish to metadata-changes topic
    K->>ESW: consume change event
    ESW->>ES: index document
    Note over ES: title appears in search within seconds

Component Summary#

Component	Technology	Purpose
API Gateway	Kong / AWS API GW	Auth, rate limiting, routing
BFF	Node.js / Java	Fan-out to genre services, bulkhead failure isolation
Stream Service	Java	Resume position lookup, completion threshold, progress writes
Search Service	Java	Translates queries, calls Elasticsearch, returns ranked results
Genre Services	Java microservices	Per-genre row fetching
Redis	Redis Cluster	Genre row cache, double-checked locking on miss
Content DB	PostgreSQL	Titles, metadata, cast, S3 URLs, watch history
Resume DB	Cassandra	Resume positions — 7.5M writes/second, partition key: user_id
Search Index	Elasticsearch	Inverted index, fuzzy matching, boost-based relevance scoring
CDC Pipeline	Debezium + Kafka + ES Sync Worker	Async sync from PostgreSQL to Elasticsearch
Object Storage	S3	64 PB video chunks
CDN	Netflix Open Connect	Global edge, push + pull hybrid, LRU + TTL eviction
Transcoding	Kafka + Worker Pool	Parallel encoding to all resolutions and codecs
Telemetry	Custom ingest service	Client-side TTFF and buffering ratio
Observability	Prometheus + Grafana + PagerDuty	SLI measurement, alerting, dashboards

Key Design Decisions and Their Justifications#

BFF over client-driven fan-out — 20+ parallel calls from a mobile client on 3G is brutal. BFF absorbs all fan-out server-side, client makes one call. Bulkheads inside BFF provide the same failure isolation.

Search Service + Elasticsearch over PostgreSQL LIKE — LIKE '%leo%' is a full table scan across 600,000 rows. At 25,000 search queries per second, this saturates the PostgreSQL instance and degrades every other read on the same DB. Elasticsearch's inverted index makes search a direct lookup, not a scan. Fuzzy matching and boost-based relevance scoring are impossible with LIKE.

CDC over dual-write for Elasticsearch sync — writing to PostgreSQL and Elasticsearch simultaneously in the same code path creates distributed failure modes. CDC tails PostgreSQL's write-ahead log passively — the transcoding pipeline writes to one system, Debezium propagates the change asynchronously. A few seconds of search staleness is acceptable.

Cursor pagination over offset — Netflix adds content constantly. Offset pagination produces duplicates and gaps under concurrent writes. Cursor is stable regardless of what is added or removed.

Push + pull hybrid CDN — pure pull causes cache stampede on hot releases (all CDN nodes cold at 9pm). Pure push wastes 76 TB per CDN server on unpopular content. Hybrid: push for top releases, pull for long tail.

Double-checked locking on cache miss — single-check locking allows N waiting requests to all hit DB one by one after the lock is released. Double-check means only the first request ever reaches the DB.

Adaptive bitrate as load shedding — when a CDN node fails and users failover to a neighbouring node, that node is at 3× capacity. Dropping all clients to lower quality reduces per-user bandwidth by 5× and stops the cascade.

PostgreSQL for watch history, Cassandra for resume positions — watch history is 578 writes/second across 150M DAU, well within PostgreSQL limits. Resume positions are 7.5M writes/second at peak (30M peak streamers each writing every 4 seconds) — 750× PostgreSQL's single-node limit. Same data shape, completely different write frequency, completely different DB choice.

Completion threshold in Stream Service — positions ≥ 95% of total duration return 0 instead of the actual position. This logic belongs in the Stream Service, not Cassandra — Cassandra stores raw positions only, business rules live in the service layer.