SYSTEM DESIGN MASTERY COURSE TRACK B · HIGH-LEVEL DESIGN · MODULE B1 · WEEK 11 BEGINS TRACK B
High-Level Design · Distributed Systems Foundations

HLD
Fundamentals

CAP Theorem · Consistency Models · Availability Patterns · Load Balancing · Latency Numbers · Back-of-Envelope · The 7-Step Framework

N S W E
8
Topics
4
Tasks
7
Framework Steps
B1
Module
12
Checklist Items
CAP Theorem
In a distributed system, you can guarantee at most 2 of 3 properties
C — Consistency
A — Availability
P — Partition Tolerance
CA
Traditional RDBMS
(single node)
CP
HBase · ZooKeeper
MongoDB (strong)
AP
Cassandra · DynamoDB
CouchDB · DNS
The Key Insight
Network partitions will happen. P is not optional in a real distributed system. The real choice is between C and A when a partition occurs.
CP — Consistent + Partition Tolerant
Rejects requests during a partition to guarantee data consistency. Used when: bank transfers, inventory reservation, ticket booking, distributed locks.
AP — Available + Partition Tolerant
Accepts reads/writes during partition — may return stale data. Used when: social media likes, user profiles, DNS, shopping carts, analytics.
The interview one-liner: "During a network partition, I can either reject requests to guarantee consistency (CP), or serve potentially stale data to stay available (AP). Since partitions are inevitable, the real design question is: for this use case, which failure is more acceptable?"
PACELC — The Extension
CAP only covers partitions. PACELC adds the non-partition case.
P → Partition: choose A or C  (same as CAP)
E → Else (no partition): choose Latency or Consistency

PACELC(P:A/C ; E:L/C)

Cassandra:   PA/EL  — available during partition; low latency normally
DynamoDB:    PA/EL  — same profile as Cassandra
HBase:       PC/EC  — consistent always; accepts higher latency
Zookeeper:   PC/EC  — built for coordination, strong guarantees
MySQL:       PC/EC  — ACID, consistent always
CP vs AP — When to Choose
SCENARIOCHOOSEREASON
Bank account balance check before debitCPWrong balance → real financial harm
Facebook likes on viral postAPApproximate count is fine; accuracy not critical
Hotel room reservation (last room)CPDouble-booking is catastrophic
User profile picture updateAPSlightly stale photo is acceptable
Stock trade order placementCPPrice must be accurate; regulatory requirement
DNS lookupsAPStale DNS record better than no answer
Consistency Models
From strongest guarantees to weakest — each a deliberate trade-off
← STRONGER CONSISTENCY (Higher Latency) (Lower Latency, Higher Availability) WEAKER →
Linearizable Sequential Causal Read-Your-Writes Eventual
Linearizable
Operations appear atomic in real-time wall-clock order. Strongest guarantee. Every observer sees the same history.
etcd, Zookeeper — distributed locks, leader election
Sequential
All nodes agree on operation order but not necessarily real-time. Order preserved, clock may lag.
Some multi-CPU memory models, academic systems
Causal
Causally related ops seen in correct order. Concurrent ops may differ across nodes. Reply always after parent post.
DynamoDB streams, MongoDB read concern "majority"
Read-Your-Writes
You always see your own writes, even if others don't yet. Route user reads to their write replica, or use sticky sessions.
Social networks — you see your own tweet immediately
Eventual
No new updates → all replicas converge. No WHEN guarantee. Reads may be stale. Writes never rejected.
Cassandra, DynamoDB, DNS propagation, S3
Common interview mistake: "Eventual consistency is bad." It's a deliberate trade-off. Facebook doesn't need strong consistency for Like counts — the 1% accuracy cost buys massive availability and throughput gains.
Consistency vs Latency Trade-off
Strong consistency (Paxos/Raft):
  Requires majority quorum before returning → adds 1+ network round trips
  Typical latency: 5–50ms extra per operation
   Correct always   Slower

Eventual consistency (async replication):
  Write returns immediately after local write → low latency
  Replicas catch up asynchronously
   Fast, available   Reads may be stale (replication lag)

Tunable consistency (Cassandra):
  Per-query consistency level: ONE, QUORUM, ALL
  QUORUM write + QUORUM read → strong consistency
  ONE write + ONE read → eventual
  Trade-off per operation based on use case
Availability Patterns
Measuring and achieving the nines
99%
87.6 hrs / year
2 nines — not acceptable for production
99.9%
8.7 hrs / year
3 nines — typical for internal services
99.99%
52.6 min / year
4 nines — typical commercial SLA
99.999%
5.3 min / year
5 nines — telecom / financial grade
Compounding availability: If your system has 3 services in sequence each at 99.9%, the end-to-end availability is 0.999³ = 99.7%. More services in the request path → lower total availability. Prefer parallel over sequential for resilience.
Active-Passive (Failover)
Normal:    [Client] ──→ [Active Node]      [Passive] (standby, synced)
Failover:  [Client] ──→ [Passive Node]     [Active]  (dead/recovering)

Variants:
  Hot standby:  Passive running + synced → failover in seconds
  Warm standby: Passive needs startup → minutes
  Cold standby: Passive needs provisioning → minutes to hours

Challenge: Split-brain
  Network partition → both nodes think they're active primary
  Both accept writes → divergent, irreconcilable state
  Prevention: Quorum (majority must agree) + Fencing tokens
Active-Active (Load Sharing)
Normal: [Client] ──→ [Load Balancer] ──→ Node A (active, serving)
                                     ──→ Node B (active, serving)
                                     ──→ Node C (active, serving)

All nodes handle reads AND writes simultaneously.
Conflict resolution required:
  - Last-write-wins (timestamp)
  - CRDTs (Conflict-free Replicated Data Types)
  - Application-level merge logic

Used in: Cassandra, DynamoDB, Akamai CDN, most NoSQL at scale
Load Balancing
Distributing traffic across a fleet · L4 vs L7 · six algorithms
FEATUREL4 — Transport LayerL7 — Application Layer
Works onTCP/UDP packetsHTTP/HTTPS requests
Routing basisIP address + portURL path, headers, cookies, body
SpeedVery fast (no content inspection)Slower (parses full HTTP request)
SSL terminationNo — passes through encryptedYes — decrypts once at LB
Smart routingNo — IP-based onlyYes — /api → API servers, /images → CDN
ExamplesAWS NLB, HAProxy (L4 mode)AWS ALB, NGINX, Envoy, Caddy
Load Balancing Algorithms
Round Robin
Requests distributed evenly in rotation: A → B → C → A → B → C. Simple, assumes equal server capacity.
USE WHEN: homogeneous fleet, short equal requests
Weighted Round Robin
Servers with higher weight get proportionally more requests. Weight 3:2:1 → A gets 3 per cycle, B gets 2, C gets 1.
USE WHEN: heterogeneous fleet (some servers bigger)
Least Connections
New request → server with fewest active connections. Better than round-robin when request duration varies significantly.
USE WHEN: long-lived connections, variable processing time
Least Response Time
Routes to server with lowest latency + fewest connections. Requires health-check latency monitoring overhead.
USE WHEN: latency-sensitive, servers have variable performance
IP Hash (Sticky)
Hash(client IP) mod N → always same server. Guarantees same client always hits same server. Breaks if server count changes.
USE WHEN: avoiding external session store (prefer Redis instead)
Consistent Hashing
Map servers and requests to same hash ring. Adding/removing a server only remaps K/N keys (where K=keys, N=servers). Minimal disruption.
USE WHEN: distributed caches, CDN routing, sharded databases
Interview trap on sticky sessions: IP Hash is a workaround for stateful servers. The correct solution is to make servers stateless by externalising session state to Redis. Then any server can handle any request — no stickiness needed.
Latency & Throughput
Numbers every engineer should know by heart
OPERATIONLATENCYRELATIVE SCALENOTE
L1 cache reference0.5 ns
CPU register-speed
L2 cache reference7 ns
14× slower than L1
RAM access100 ns
Baseline for in-memory ops
SSD random read0.1 ms
1000× slower than RAM
Network within datacenter0.5 ms
Same-AZ latency
HDD random read10 ms
Mechanical seek time
Intra-region (cross-AZ)1–5 ms
Same region, different AZ
Cross-region (US → EU)~100 ms
Speed of light across Atlantic
Little's Law
L = λ × W

L  =  average items in system (queue depth)
λ  =  arrival rate (throughput, requests/sec)
W  =  average time in system (latency, seconds)

Example:
  Service handles 100 req/sec (λ = 100)
  Average latency is 50ms   (W = 0.05s)
  Avg concurrent requests:  L = 100 × 0.05 = 5 concurrent requests

Key insight:
  If latency grows (W↑) and arrival rate stays constant (λ=const),
  queue depth grows (L↑). Eventually queue overflows → system collapse.
  → Latency spikes are early warning signs of capacity problems.
Latency vs Throughput trade-off: Processing requests in larger batches increases throughput but adds per-request latency (waiting to fill the batch). Streaming one-at-a-time minimises latency but reduces throughput. Choose based on SLA: batch for analytics pipelines, stream for user-facing APIs.
Back-of-Envelope Estimation
Show your math · round aggressively · use powers of 10
The 5-Step Estimation Framework
1
Clarify scale: DAU, requests per user per day, retention period
2
Peak QPS = (DAU × req/day) ÷ 86,400 × 2–3 (peak factor)
3
Storage = items/day × item_size × retention_days
4
Bandwidth = peak_QPS × avg_payload_size
5
Servers = peak_QPS ÷ capacity_per_server (typically 10K–100K rps/server)
Worked Example — Twitter Scale
Assumptions:
  300M DAU
  Reads: 100 tweets/user/day
  Writes: 2 tweets/user/day
  Avg tweet size: ~1 KB (text + metadata)

Read QPS:    300M × 100 ÷ 86,400 ≈ 350,000 reads/sec
             Peak (3×): ~1M reads/sec

Write QPS:   300M × 2 ÷ 86,400 ≈ 7,000 writes/sec
             Peak (3×): ~21,000 writes/sec

Storage (5 years):
  300M × 2 tweets/day × 365 × 5 × 1 KB
  = 300M × 3,650 × 1,000 bytes
  ≈ 1.1 PB (tweets only, excluding media)

Bandwidth:
  Reads:  1M req/s × 1 KB = 1 GB/sec
  Writes: 21K req/s × 1 KB ≈ 21 MB/sec
Storage Cheat Sheet
Character (ASCII):   1 byte       Integer:    4 bytes     Long:      8 bytes
UUID:               16 bytes      Timestamp:  4 bytes
Image (compressed): 100 KB – 5 MB
HD video 1 min:     ~60 MB (H.264 compressed)
4K video 1 min:     ~375 MB

Units:   1 KB = 10³    1 MB = 10⁶    1 GB = 10⁹    1 TB = 10¹²    1 PB = 10¹⁵

Rule of 86,400:  1 req/sec → 86,400 req/day ≈ 100K req/day
Rule of 30M:     1 req/sec → ~2.5M req/month ≈ 30M req/year
The 7-Step HLD Interview Framework
Apply this to every system design question · 45 minutes total
01
Requirements (5 min)
Functional: core features, use cases, users. Non-functional: scale, latency target, availability SLA, consistency needs, geo distribution. Don't assume — ask.
02
Estimation (5 min)
DAU, peak QPS, storage/year, bandwidth. Show calculations. Round aggressively. Numbers drive every architecture decision that follows.
03
High-Level Design (10 min)
Draw the big boxes: clients, API gateway, services, databases, caches, queues. Identify read vs write path. Breadth first — don't go deep yet.
04
Deep Dive (15 min)
Pick 1–2 most interesting components. Database schema, critical API design, specific algorithm. The part they're actually testing you on.
05
Bottlenecks (5 min)
Where is the hotspot? What breaks first at 10× scale? What trade-offs did you make and why?
06
Failure Scenarios (2 min)
DB goes down? Cache is cold? DC failover? What is the graceful degradation story?
07
Scale Evolution (optional)
What changes at 10× more load? Sharding strategy? CDN for static content? Read replicas?
Component Decision Guide
COMPONENTUSE WHENEXAMPLES
CDNStatic content, globally read-heavy, media filesCloudflare, Akamai, AWS CloudFront
Load BalancerMultiple backend instances, traffic distribution, SSL terminationAWS ALB/NLB, NGINX, HAProxy
Cache (Redis)Hot reads, session storage, rate limiting counters, leaderboardsRedis, Memcached, DynamoDB DAX
Message QueueAsync processing, decouple services, event streaming, retry logicKafka, RabbitMQ, AWS SQS
SQL DatabaseACID transactions, complex queries, structured data, joinsPostgreSQL, MySQL, AWS Aurora
NoSQL DatabaseHigh throughput, flexible schema, horizontal scale, simple access patternsDynamoDB, Cassandra, MongoDB
Object StorageLarge files, images, videos, backups, low costAWS S3, GCS, Azure Blob
Search EngineFull-text search, faceted filtering, fuzzy matchingElasticsearch, OpenSearch, Algolia
Interview Tips — Common Mistakes
MISTAKECORRECTION
"CAP says choose 2 of 3"CAP says choose C or A when a partition occurs. P is not optional — always required.
"Eventual consistency is always bad"Deliberate trade-off. Facebook likes don't need strong consistency. Know when it's acceptable.
Jumping to DB choice firstStart with requirements → scale → access patterns → then DB choice follows naturally.
Not estimating scaleEvery HLD starts with numbers. Scale determines architecture. Always estimate.
"Just add a cache"Cache invalidation is one of the hardest problems. When do you invalidate? On write? TTL? Both?
Ignoring failure scenariosInterviewers want to see: what happens when X fails? What's the recovery path?
01
CAP Theorem — 8 Scenario Analysis
~1 hr

For each scenario: identify CP or AP, justify your choice, and name the trade-off you're accepting.

  1. Online banking — check account balance before debit
  2. Facebook Like counter on a viral post
  3. Uber driver location updates (moves every 4 seconds)
  4. Hotel room reservation (last room available)
  5. Amazon product reviews display
  6. Stock trading platform — order placement
  7. WhatsApp message delivery status (sent/delivered/read)
  8. E-commerce shopping cart (items added by user)

For each: state the exact failure mode if you choose wrong (e.g., "if I choose AP for banking, a user could overdraft").

02
Back-of-Envelope — WhatsApp + YouTube
~1.5 hrs

WhatsApp: 2B users, 100M active daily, 100 messages/user/day, avg 100 bytes/message.
Calculate: peak QPS, storage/year, bandwidth (in + out).

YouTube: 2B users, 500 hours of video uploaded per minute, 1B views/day, avg view 10 min at 1 Mbps.
Calculate: upload storage/year, CDN bandwidth for views, approximate CDN cost (assume $0.01/GB).

For each: show all steps. Identify the biggest bottleneck revealed by your numbers.

03
Consistency Model Selection — 6 Scenarios
~1 hr

Choose the appropriate consistency model and justify:

  1. Bank ledger — transfer between two accounts
  2. User profile picture update
  3. Social media comments — replies must appear after parent
  4. Shopping cart — items added/removed
  5. Distributed lock (exactly one service holds the lock)
  6. Netflix "continue watching" progress position

For each: name the exact model (Linearizable / Sequential / Causal / Read-Your-Writes / Eventual), the implementation mechanism, and the failure mode if you under-constrain.

HLD Framework — URL Shortener (Full Walkthrough)
~2 hrs · full design

Apply all 7 steps to design TinyURL. No code — framework output only.

  • Scale: 300M URLs stored, 100:1 read:write ratio, 5-year retention
  • Latency target: redirect in <10ms (p99)
  • Availability: 99.99%

Required outputs:

  1. Peak QPS (read + write)
  2. Storage for 5 years
  3. High-level diagram (boxes + arrows, read path + write path)
  4. DB choice + justification (CAP + access pattern reasoning)
  5. Key design decision: how do you generate unique 7-char short codes?
  6. Biggest bottleneck + how to address it
  7. Failure scenario: what if the DB is unreachable during a redirect?
0 / 12 completedMODULE B1 · HLD FUNDAMENTALS
Can explain CAP theorem without buzzwords — the partition choice framing
Know PACELC extension — how it covers non-partition latency/consistency
Distinguish: Linearizable vs Causal vs Eventual — when each is appropriate
Can calculate end-to-end availability from N services in sequence
Know Active-Passive vs Active-Active tradeoffs; split-brain prevention
Know all 6 load balancing algorithms + when to use each
L4 vs L7 load balancers — routing basis, SSL termination, smart routing
Memorized the latency numbers table (RAM vs SSD vs HDD vs network)
Can apply Little's Law to analyse queue depth from QPS + latency
Can do back-of-envelope for any system in <5 minutes
Know the 7-step HLD framework + time allocation for 45-min interview
✏️ All 4 tasks completed (CAP, estimation, consistency, URL shortener)
NEXT MODULE
B2 — Databases at Scale
Indexing strategies · SQL vs NoSQL trade-offs · Sharding (horizontal partitioning) · Replication (primary-replica, multi-primary) · ACID vs BASE · Read replicas · Connection pooling · Database selection guide
← A6: LLD Case Studies 📄 Full Notes ↑ Roadmap B2: Databases at Scale →