02
Concepts
Core Concepts
The vocabulary of distributed systems. Every architecture decision traces back to one of these.
Performance Metrics
Latency, throughput, bandwidth, and percentiles — the four measurements that appear in every design discussion.
LatencyThroughputBandwidthP99
Open topic →
Service Levels
SLIs, SLOs, SLAs, and error budgets. How you define and commit to system behavior — and what happens when you breach it.
SLISLOSLAError Budget
Open topic →
Availability
SPOF, redundancy, nines, active-active vs active-passive. What "highly available" actually requires you to build.
SPOFRedundancyNinesActive-Active
Open topic →
Reliability
MTBF, MTTR, RTO, RPO. The difference between a system that rarely fails and one that recovers fast when it does.
MTBFMTTRRTORPO
Open topic →
Scalability
Horizontal vs vertical scaling, load balancing at L4 and L7, auto-scaling, and connection draining. How systems grow without breaking.
Load BalancingL4 / L7Auto-ScalingCold Start
Open topic →
Fault Tolerance
Graceful degradation, bulkheads, circuit breakers, retries with backoff. How systems survive partial failure without cascading.
Circuit BreakerBulkheadRetryBackoff
Open topic →
Durability
WAL, replication, and backups. The mechanisms that ensure data survives node failure, crash, or disaster.
WALReplicationBackupsCrash Recovery
Open topic →
Concurrency & Locking
Race conditions, pessimistic and optimistic locking, MVCC, and distributed locks. How systems handle parallel writes without corruption.
Optimistic LockingMVCCDistributed LockIdempotency
Open topic →
Consistency Models
Strong, eventual, causal, and monotonic. What "up to date" means in a distributed system and when each model is the right call.
Strong ConsistencyEventualCausalMonotonic
Open topic →
Network Partitions
What happens when nodes can't talk. Split brain, quorum decisions, and why partition handling defines your entire consistency strategy.
Split BrainQuorumFencingPartition Recovery
Open topic →
CAP Theorem
Consistency vs availability when a partition happens. CP vs AP — and why every distributed system is forced to choose during a split.
CAPCP SystemsAP SystemsPartition Tolerance
Open topic →
PACELC
Extends CAP to cover normal operation. Even without a partition, you still trade latency against consistency — PACELC names that tradeoff.
PACELCLatency vs ConsistencyPA/ELPC/EC
Open topic →
Security
Auth, JWT, encryption at rest and in transit. The security primitives that belong in every system design, not just the ones marked "sensitive."
JWTOAuthEncryptionTLS
Open topic →
State Machines
Modeling order flows, workflows, and status transitions as explicit states. The pattern that makes async systems auditable and debuggable.
State TransitionsDB ImplementationAudit TrailTimeout Events
Open topic →
NFRs
Non-functional requirements: the hidden constraints that shape every architecture decision before you draw a single box.
Scalability NFRsAvailability NFRsConflicting NFRsDesign Decisions
Open topic →