Dispatch Flow — Scheduling#

Full End-to-End Scheduling Flow#

flowchart TD
    CS[Calling Service] --> AS[App Server]
    AS -->|scheduled_at is null| K[Kafka]
    AS -->|scheduled_at is future - apply jitter if low priority| SDB[(Scheduler DB - Cassandra)]
    SDB -->|poll every second - SELECT WHERE minute_bucket = now AND scheduled_at <= now| SS[Scheduler Service]
    SS -->|publish due notifications| K
    K --> PW[Push Worker]
    K --> SW[SMS Worker]
    K --> EW[Email Worker]
    PW --> APNs
    SW --> Twilio
    EW --> SendGrid

Scheduler Polling Interval#

The scheduler polls the Scheduler DB every 1 second. This means a notification can be up to 1 second late — acceptable for all notification types including high-priority ones.

Polling more frequently (every 100ms) reduces latency but increases DB load — 10× more queries for marginal improvement. 1 second is the right balance.

What the scheduler does each second:

1. Compute current minute bucket: '2026-04-19-09-00'
2. Query: SELECT * FROM scheduled_notifications 
          WHERE minute_bucket = '2026-04-19-09-00' 
          AND scheduled_at <= now()
3. For each due notification → publish to appropriate Kafka topic
4. Delete processed rows from Scheduler DB
5. Sleep 1 second, repeat

Scheduler Service — Avoiding Double Dispatch#

If two scheduler service instances are running (for fault tolerance), both might find the same due notifications and publish them twice — resulting in duplicate notifications.

The fix is a distributed lock — only one scheduler instance runs the poll-and-publish loop at a time. Redis SET NX (set if not exists) with a short TTL works well:

Redis SET scheduler_lock <instance_id> NX EX 5
→ only the instance that wins the lock runs the dispatch
→ lock expires after 5 seconds, next instance can take over if leader crashes

Scheduler is a single-leader service

Only one scheduler instance dispatches at a time. Others are on standby. If the leader crashes, the Redis lock expires and a standby takes over within 5 seconds. At most 5 seconds of missed dispatches — acceptable given the 1-second polling resolution anyway.

After Dispatch — Normal Pipeline#

Once the scheduler publishes a notification to Kafka, it is indistinguishable from an immediate notification. The channel workers consume it, check preferences in Redis, deduplicate via bloom filter, dispatch to the external provider, and update status in Cassandra. No special handling needed downstream.

Scheduled notification lifecycle:

App Server → Scheduler DB (hold)
                  ↓ (at scheduled_at)
           Scheduler Service → Kafka
                                  ↓
                           Channel Worker
                                  ↓
                        APNs / Twilio / SendGrid
                                  ↓
                           User's Device
                                  ↓
                        Status → DELIVERED in Cassandra

Summary#

Component	Detail
Scheduler DB	Cassandra
Partition key	minute_bucket
Clustering key	scheduled_at ASC
Polling interval	1 second
Jitter	Intake jitter for low-priority, none for OTPs/alerts
Duplicate prevention	Redis distributed lock on scheduler leader
After dispatch	Rejoins normal Kafka pipeline