Reliability — Overview#
Availability means the system is reachable. Reliability means it gives correct answers. These are different problems with different solutions.
A system can be perfectly available and completely broken at the same time — returning wrong data, stale responses, or corrupt results. This folder covers reliability as a separate concern from availability, and the metrics (MTBF, MTTR, RTO, RPO) used to design and measure it.
Files in this folder#
| File | Topic |
|---|---|
| 01-Reliability.md | What reliability is and available-but-wrong examples |
| 02-MTBF-and-MTTR.md | How often things break vs how fast you recover |
| 03-RTO-and-RPO.md | Maximum acceptable downtime vs maximum acceptable data loss |
| 03b-Standby-and-Replication-Patterns.md | Warm vs hot standby, sync vs async replication cost |
| 05-Interview-Cheatsheet.md | How to use reliability concepts in a design interview |