Cleanup Flow
The cleanup job processes expired pastes in batches — not one at a time. Each batch handles the full deletion chain: ref_count, S3, content table, pastes table, and Redis.
Why batches, not row-by-row#
Processing one paste at a time means: - One Postgres query per paste - One S3 delete call per paste - One Redis delete call per paste
At 1M expirations per day, that's 1M individual S3 API calls. S3 charges per API call. More importantly, at ~10ms per S3 call, processing 1M pastes sequentially takes ~3 hours just on S3 round trips alone.
Batch processing collapses this: - One Postgres query fetches 1000 expired rows - One S3 batch delete removes up to 1000 objects in a single API call (S3 native batch delete API) - One Redis pipeline deletes 1000 keys in a single round trip
The same work happens in a fraction of the time at a fraction of the cost.
The full cleanup flow per batch#
1. SELECT short_code, content_hash FROM pastes
WHERE expires_at < now()
AND status = 'NOT_EXPIRED'
LIMIT 1000
2. For each row:
→ UPDATE content SET ref_count = ref_count - 1
WHERE content_hash = ?
3. Collect content_hashes where ref_count = 0 after decrement
4. If any ref_count = 0:
→ S3 batch delete (up to 1000 objects in one API call)
→ DELETE FROM content WHERE content_hash IN (...)
5. DELETE FROM pastes WHERE short_code IN (...)
6. Redis pipeline: DEL shortCode1 shortCode2 ... shortCode1000
Why ref_count matters#
Multiple paste rows can point to the same S3 object — the dedup mechanism ensures identical content is stored once. ref_count tracks how many paste rows reference that content.
You only delete from S3 and the content table when ref_count hits 0. If two pastes share the same content and one expires, the S3 object stays — the other paste still needs it.
Paste A (expires today) → content_hash: abc123, ref_count: 2
Paste B (expires next month) → content_hash: abc123
Cleanup runs:
Decrement ref_count for abc123 → ref_count = 1
ref_count != 0 → do NOT delete S3 object
Delete pastes row for Paste A only
Only when Paste B also expires and ref_count drops to 0 does the S3 object get deleted.
Redis invalidation#
The cleanup job deletes Redis keys for all expired shortCodes in the batch, regardless of whether the content was actually in Redis. Redis DEL on a non-existent key is a no-op — safe to call unconditionally.
This ensures no stale cached content survives past expiry.