Skip to content

Message Queues

Queue#

Before distributed systems, before brokers, before Kafka — a queue is just a list with one rule: first in, first out.

Add →  [ job_1, job_2, job_3 ]  → Take
       oldest              newest

You add to one end, you take from the other. Job_1 was added first, job_1 gets processed first. That's it.

In code, this is just an in-memory list:

queue = []

queue.append("send welcome email to john@gmail.com")
queue.append("send welcome email to sarah@gmail.com")

job = queue.pop(0)   # takes from the front — first in, first out
do_work(job)

Simple, fast, works perfectly — as long as everything stays on one machine and never crashes.


What's the "message" in Message Queue?#

In the example above, the items in the queue were plain strings: "send welcome email to john@gmail.com". That works for trivial cases, but real systems pass structured data — called a message.

A message is a small packet of information. It typically has: - A body — the actual payload (what work needs to be done, or what event happened) - Optional metadata — things like timestamp, message ID, retry count

{
  "message_id": "msg_abc123",
  "timestamp": "2024-01-15T10:30:00Z",
  "body": {
    "event": "user_signed_up",
    "user_id": 9981,
    "email": "john@gmail.com"
  }
}

The code that reads this message knows exactly what to do: send a welcome email to john@gmail.com for user 9981.

Message Queue is just a queue where the items are structured messages — not raw strings or function calls. That's the only difference from the plain data structure above.


The problem with in-memory#

Your server enqueues 10,000 jobs. The server crashes before all the jobs are processed.

queue = ["email john", "email sarah", "email mike" ... 9,997 more]
Server crashes ↓
queue = []   ← gone. All 10,000 jobs lost.

In-memory means the data lives in RAM. RAM is wiped on crash. There's no record those jobs ever existed.

This is fine for throwaway work. It's catastrophic for anything that matters — billing, notifications, order processing.

The fix is to move the queue outside the server — into a dedicated system that persists messages to disk and survives crashes. That's what a message queue service (SQS, RabbitMQ, Kafka) actually is: a durable, external queue that your servers talk to over the network.


The problem that forced queues into existence#

Imagine you're building Instagram. A user hits "Post Photo". Here's what needs to happen:

1. Upload photo bytes to S3
2. Save metadata (photo_id, user_id, s3_url, timestamp) to DB
3. Send push notifications to all followers
4. Update the news feed for all followers
5. Generate thumbnail versions of the photo
6. Run the photo through content moderation AI

Steps 1 and 2 are mandatory before the user can see their photo live. But steps 3–6? The user doesn't care if their followers have already been notified. They just care that their photo is live

If you do all 6 steps synchronously — meaning the user's HTTP request waits for all of them — you're making the user wait for: - Push notification delivery - News feed updates for potentially millions of followers - Thumbnail generation (CPU-heavy) - AI moderation (can take seconds)

That's a terrible experience. The user's "Post" button just spins for 5+ seconds.


The fix — async processing#

The insight is simple: only block the user for work they actually need.

User hits Post
→ Upload to S3                          ← must happen first
→ Save metadata to DB                   ← must happen first
→ Return "Photo posted!" to user        ← user is done, response back in ~200ms

Meanwhile, in the background...
→ Notification Service sends push notifications
→ Feed Service updates follower feeds
→ Thumbnail Service generates thumbnails
→ Moderation Service runs AI check

But here's the problem: your server can't just say "I'll do notifications later" and hope it remembers. What if the server crashes right after saving to the DB? All those background tasks are lost.

You need something durable — something that holds onto the work until it gets done. That's the queue.


How a message queue solves it#

User hits Post
→ Upload to S3
→ Save metadata to DB
→ Drop a message in the queue: { event: "photo_posted", photo_id: 123, user_id: 456 }
→ Return "Photo posted!" to user   ← ~200ms, user is happy

Queue durably holds the message.

Notification Service  → picks up message → sends push notifications → acks → message deleted

Feed Service          → picks up message → updates feeds → acks → message deleted

Thumbnail Service     → picks up message → generates thumbs → acks → message deleted

Moderation Service    → picks up message → runs AI check → acks → message deleted

The server that posts the photo is the producer — it creates a message and drops it in the queue.

The services that pick up and process the message are consumers (also called workers) — Notification, Feed, Thumbnail, Moderation services are all consumers here.

The producer and consumers don't know about each other. The photo server doesn't call the notification server directly. It drops a message and walks away. This is called decoupling.


The second problem queues solve — traffic spikes#

It's New Year's Eve. A million people post photos at midnight. Your system goes from 1,000 posts/sec to 100,000 posts/sec.

The notification service calls Apple's APNs and Google's FCM — external APIs with rate limits. You can't throw 100,000 notification requests at them per second. The thumbnail service is CPU-heavy — you can't spin up unlimited servers instantly.

Without a queue, the notification service gets overwhelmed. It starts returning errors. Parts of your system go down.

With a queue:

Producer drops 100,000 messages/sec into the queue  ← queue just holds them
Consumer (notification service) drains at its own pace — 10,000/sec

Result: notifications are slightly delayed, but nothing crashes

The queue acts as a buffer. The producer doesn't care how fast the consumer is. The consumer doesn't care how fast the producer is. The queue absorbs the difference.

This is the key mental model: a queue trades latency (notifications are delayed) for availability (the system stays up). That's almost always the right trade-off.


The two reasons message queues exist#

  1. Async processing — don't block the user for work they don't need to wait for
  2. Absorbing traffic spikes — protect consumers from being overwhelmed by bursts

Both come down to the same root idea: the producer and the consumer operate at different speeds and different times, and the queue bridges that gap.