Skip to content

Cleanup Flow

The cleanup job processes expired pastes in batches — not one at a time. Each batch handles the full deletion chain: ref_count, S3, content table, pastes table, and Redis.


Why batches, not row-by-row#

Processing one paste at a time means: - One Postgres query per paste - One S3 delete call per paste - One Redis delete call per paste

At 1M expirations per day, that's 1M individual S3 API calls. S3 charges per API call. More importantly, at ~10ms per S3 call, processing 1M pastes sequentially takes ~3 hours just on S3 round trips alone.

Batch processing collapses this: - One Postgres query fetches 1000 expired rows - One S3 batch delete removes up to 1000 objects in a single API call (S3 native batch delete API) - One Redis pipeline deletes 1000 keys in a single round trip

The same work happens in a fraction of the time at a fraction of the cost.


The full cleanup flow per batch#

1. SELECT short_code, content_hash FROM pastes
   WHERE expires_at < now()
   AND status = 'NOT_EXPIRED'
   LIMIT 1000

2. For each row:
   → UPDATE content SET ref_count = ref_count - 1
     WHERE content_hash = ?

3. Collect content_hashes where ref_count = 0 after decrement

4. If any ref_count = 0:
   → S3 batch delete (up to 1000 objects in one API call)
   → DELETE FROM content WHERE content_hash IN (...)

5. DELETE FROM pastes WHERE short_code IN (...)

6. Redis pipeline: DEL shortCode1 shortCode2 ... shortCode1000

Why ref_count matters#

Multiple paste rows can point to the same S3 object — the dedup mechanism ensures identical content is stored once. ref_count tracks how many paste rows reference that content.

You only delete from S3 and the content table when ref_count hits 0. If two pastes share the same content and one expires, the S3 object stays — the other paste still needs it.

Paste A (expires today) → content_hash: abc123, ref_count: 2
Paste B (expires next month) → content_hash: abc123

Cleanup runs:
  Decrement ref_count for abc123 → ref_count = 1
  ref_count != 0 → do NOT delete S3 object
  Delete pastes row for Paste A only

Only when Paste B also expires and ref_count drops to 0 does the S3 object get deleted.


Redis invalidation#

The cleanup job deletes Redis keys for all expired shortCodes in the batch, regardless of whether the content was actually in Redis. Redis DEL on a non-existent key is a no-op — safe to call unconditionally.

This ensures no stale cached content survives past expiry.