Reliability — Overview#

Availability means the system is reachable. Reliability means it gives correct answers. These are different problems with different solutions.

A system can be perfectly available and completely broken at the same time — returning wrong data, stale responses, or corrupt results. This folder covers reliability as a separate concern from availability, and the metrics (MTBF, MTTR, RTO, RPO) used to design and measure it.

Files in this folder#

File	Topic
01-Reliability.md	What reliability is and available-but-wrong examples
02-MTBF-and-MTTR.md	How often things break vs how fast you recover
03-RTO-and-RPO.md	Maximum acceptable downtime vs maximum acceptable data loss
03b-Standby-and-Replication-Patterns.md	Warm vs hot standby, sync vs async replication cost
05-Interview-Cheatsheet.md	How to use reliability concepts in a design interview