Skip to content

Reliability — Overview#

Availability means the system is reachable. Reliability means it gives correct answers. These are different problems with different solutions.

A system can be perfectly available and completely broken at the same time — returning wrong data, stale responses, or corrupt results. This folder covers reliability as a separate concern from availability, and the metrics (MTBF, MTTR, RTO, RPO) used to design and measure it.


Files in this folder#

File Topic
01-Reliability.md What reliability is and available-but-wrong examples
02-MTBF-and-MTTR.md How often things break vs how fast you recover
03-RTO-and-RPO.md Maximum acceptable downtime vs maximum acceptable data loss
03b-Standby-and-Replication-Patterns.md Warm vs hot standby, sync vs async replication cost
05-Interview-Cheatsheet.md How to use reliability concepts in a design interview