Module B1 — HLD Fundamentals | HLD Track

CAP Theorem

In a distributed system, you can guarantee at most 2 of 3 properties

C — Consistency

A — Availability

P — Partition Tolerance

CA
Traditional RDBMS
(single node)

CP
HBase · ZooKeeper
MongoDB (strong)

AP
Cassandra · DynamoDB
CouchDB · DNS

The Key Insight

Network partitions will happen. P is not optional in a real distributed system. The real choice is between C and A when a partition occurs.

CP — Consistent + Partition Tolerant

Rejects requests during a partition to guarantee data consistency. Used when: bank transfers, inventory reservation, ticket booking, distributed locks.

AP — Available + Partition Tolerant

Accepts reads/writes during partition — may return stale data. Used when: social media likes, user profiles, DNS, shopping carts, analytics.

The interview one-liner: "During a network partition, I can either reject requests to guarantee consistency (CP), or serve potentially stale data to stay available (AP). Since partitions are inevitable, the real design question is: for this use case, which failure is more acceptable?"

PACELC — The Extension

CAP only covers partitions. PACELC adds the non-partition case.

P → Partition: choose A or C  (same as CAP)
E → Else (no partition): choose Latency or Consistency

PACELC(P:A/C ; E:L/C)

Cassandra:   PA/EL  — available during partition; low latency normally
DynamoDB:    PA/EL  — same profile as Cassandra
HBase:       PC/EC  — consistent always; accepts higher latency
Zookeeper:   PC/EC  — built for coordination, strong guarantees
MySQL:       PC/EC  — ACID, consistent always

CP vs AP — When to Choose

SCENARIO	CHOOSE	REASON
Bank account balance check before debit	CP	Wrong balance → real financial harm
Facebook likes on viral post	AP	Approximate count is fine; accuracy not critical
Hotel room reservation (last room)	CP	Double-booking is catastrophic
User profile picture update	AP	Slightly stale photo is acceptable
Stock trade order placement	CP	Price must be accurate; regulatory requirement
DNS lookups	AP	Stale DNS record better than no answer

Consistency Models

From strongest guarantees to weakest — each a deliberate trade-off

      ← STRONGER CONSISTENCY (Higher Latency)
      (Lower Latency, Higher Availability) WEAKER →
    

Linearizable Sequential Causal Read-Your-Writes Eventual

Linearizable

Operations appear atomic in real-time wall-clock order. Strongest guarantee. Every observer sees the same history.

etcd, Zookeeper — distributed locks, leader election

Sequential

All nodes agree on operation order but not necessarily real-time. Order preserved, clock may lag.

Some multi-CPU memory models, academic systems

Causal

Causally related ops seen in correct order. Concurrent ops may differ across nodes. Reply always after parent post.

DynamoDB streams, MongoDB read concern "majority"

Read-Your-Writes

You always see your own writes, even if others don't yet. Route user reads to their write replica, or use sticky sessions.

Social networks — you see your own tweet immediately

Eventual

No new updates → all replicas converge. No WHEN guarantee. Reads may be stale. Writes never rejected.

Cassandra, DynamoDB, DNS propagation, S3

Common interview mistake: "Eventual consistency is bad." It's a deliberate trade-off. Facebook doesn't need strong consistency for Like counts — the 1% accuracy cost buys massive availability and throughput gains.

Consistency vs Latency Trade-off

Strong consistency (Paxos/Raft):
  Requires majority quorum before returning → adds 1+ network round trips
  Typical latency: 5–50ms extra per operation
  ✓ Correct always  ✗ Slower

Eventual consistency (async replication):
  Write returns immediately after local write → low latency
  Replicas catch up asynchronously
  ✓ Fast, available  ✗ Reads may be stale (replication lag)

Tunable consistency (Cassandra):
  Per-query consistency level: ONE, QUORUM, ALL
  QUORUM write + QUORUM read → strong consistency
  ONE write + ONE read → eventual
  Trade-off per operation based on use case

Availability Patterns

Measuring and achieving the nines

99%

87.6 hrs / year

2 nines — not acceptable for production

99.9%

8.7 hrs / year

3 nines — typical for internal services

99.99%

52.6 min / year

4 nines — typical commercial SLA

99.999%

5.3 min / year

5 nines — telecom / financial grade

Compounding availability: If your system has 3 services in sequence each at 99.9%, the end-to-end availability is 0.999³ = 99.7%. More services in the request path → lower total availability. Prefer parallel over sequential for resilience.

Active-Passive (Failover)

Normal:    [Client] ──→ [Active Node]      [Passive] (standby, synced)
Failover:  [Client] ──→ [Passive Node]     [Active]  (dead/recovering)

Variants:
  Hot standby:  Passive running + synced → failover in seconds
  Warm standby: Passive needs startup → minutes
  Cold standby: Passive needs provisioning → minutes to hours

Challenge: Split-brain
  Network partition → both nodes think they're active primary
  Both accept writes → divergent, irreconcilable state
  Prevention: Quorum (majority must agree) + Fencing tokens

Active-Active (Load Sharing)

Normal: [Client] ──→ [Load Balancer] ──→ Node A (active, serving)
                                     ──→ Node B (active, serving)
                                     ──→ Node C (active, serving)

All nodes handle reads AND writes simultaneously.
Conflict resolution required:
  - Last-write-wins (timestamp)
  - CRDTs (Conflict-free Replicated Data Types)
  - Application-level merge logic

Used in: Cassandra, DynamoDB, Akamai CDN, most NoSQL at scale

Load Balancing

Distributing traffic across a fleet · L4 vs L7 · six algorithms

FEATURE	L4 — Transport Layer	L7 — Application Layer
Works on	TCP/UDP packets	HTTP/HTTPS requests
Routing basis	IP address + port	URL path, headers, cookies, body
Speed	Very fast (no content inspection)	Slower (parses full HTTP request)
SSL termination	No — passes through encrypted	Yes — decrypts once at LB
Smart routing	No — IP-based only	Yes — /api → API servers, /images → CDN
Examples	AWS NLB, HAProxy (L4 mode)	AWS ALB, NGINX, Envoy, Caddy

Load Balancing Algorithms

Round Robin

Requests distributed evenly in rotation: A → B → C → A → B → C. Simple, assumes equal server capacity.

USE WHEN: homogeneous fleet, short equal requests

Weighted Round Robin

Servers with higher weight get proportionally more requests. Weight 3:2:1 → A gets 3 per cycle, B gets 2, C gets 1.

USE WHEN: heterogeneous fleet (some servers bigger)

Least Connections

New request → server with fewest active connections. Better than round-robin when request duration varies significantly.

USE WHEN: long-lived connections, variable processing time

Least Response Time

Routes to server with lowest latency + fewest connections. Requires health-check latency monitoring overhead.

USE WHEN: latency-sensitive, servers have variable performance

IP Hash (Sticky)

Hash(client IP) mod N → always same server. Guarantees same client always hits same server. Breaks if server count changes.

USE WHEN: avoiding external session store (prefer Redis instead)

Consistent Hashing

Map servers and requests to same hash ring. Adding/removing a server only remaps K/N keys (where K=keys, N=servers). Minimal disruption.

USE WHEN: distributed caches, CDN routing, sharded databases

Interview trap on sticky sessions: IP Hash is a workaround for stateful servers. The correct solution is to make servers stateless by externalising session state to Redis. Then any server can handle any request — no stickiness needed.

Latency & Throughput

Numbers every engineer should know by heart

OPERATION	LATENCY	NOTE
L1 cache reference	0.5 ns	CPU register-speed
L2 cache reference	7 ns	14× slower than L1
RAM access	100 ns	Baseline for in-memory ops
SSD random read	0.1 ms	1000× slower than RAM
Network within datacenter	0.5 ms	Same-AZ latency
HDD random read	10 ms	Mechanical seek time
Intra-region (cross-AZ)	1–5 ms	Same region, different AZ
Cross-region (US → EU)	~100 ms	Speed of light across Atlantic

Little's Law

L = λ × W

L  =  average items in system (queue depth)
λ  =  arrival rate (throughput, requests/sec)
W  =  average time in system (latency, seconds)

Example:
  Service handles 100 req/sec (λ = 100)
  Average latency is 50ms   (W = 0.05s)
  Avg concurrent requests:  L = 100 × 0.05 = 5 concurrent requests

Key insight:
  If latency grows (W↑) and arrival rate stays constant (λ=const),
  queue depth grows (L↑). Eventually queue overflows → system collapse.
  → Latency spikes are early warning signs of capacity problems.

Latency vs Throughput trade-off: Processing requests in larger batches increases throughput but adds per-request latency (waiting to fill the batch). Streaming one-at-a-time minimises latency but reduces throughput. Choose based on SLA: batch for analytics pipelines, stream for user-facing APIs.

Back-of-Envelope Estimation

Show your math · round aggressively · use powers of 10

The 5-Step Estimation Framework

1

Clarify scale: DAU, requests per user per day, retention period

2

Peak QPS = (DAU × req/day) ÷ 86,400 × 2–3 (peak factor)

3

Storage = items/day × item_size × retention_days

4

Bandwidth = peak_QPS × avg_payload_size

5

Servers = peak_QPS ÷ capacity_per_server (typically 10K–100K rps/server)

Worked Example — Twitter Scale

Assumptions:
  300M DAU
  Reads: 100 tweets/user/day
  Writes: 2 tweets/user/day
  Avg tweet size: ~1 KB (text + metadata)

Read QPS:    300M × 100 ÷ 86,400 ≈ 350,000 reads/sec
             Peak (3×): ~1M reads/sec

Write QPS:   300M × 2 ÷ 86,400 ≈ 7,000 writes/sec
             Peak (3×): ~21,000 writes/sec

Storage (5 years):
  300M × 2 tweets/day × 365 × 5 × 1 KB
  = 300M × 3,650 × 1,000 bytes
  ≈ 1.1 PB (tweets only, excluding media)

Bandwidth:
  Reads:  1M req/s × 1 KB = 1 GB/sec
  Writes: 21K req/s × 1 KB ≈ 21 MB/sec

Storage Cheat Sheet

Character (ASCII):   1 byte       Integer:    4 bytes     Long:      8 bytes
UUID:               16 bytes      Timestamp:  4 bytes
Image (compressed): 100 KB – 5 MB
HD video 1 min:     ~60 MB (H.264 compressed)
4K video 1 min:     ~375 MB

Units:   1 KB = 10³    1 MB = 10⁶    1 GB = 10⁹    1 TB = 10¹²    1 PB = 10¹⁵

Rule of 86,400:  1 req/sec → 86,400 req/day ≈ 100K req/day
Rule of 30M:     1 req/sec → ~2.5M req/month ≈ 30M req/year

The 7-Step HLD Interview Framework

Apply this to every system design question · 45 minutes total

01

Requirements (5 min)

Functional: core features, use cases, users. Non-functional: scale, latency target, availability SLA, consistency needs, geo distribution. Don't assume — ask.

02

Estimation (5 min)

DAU, peak QPS, storage/year, bandwidth. Show calculations. Round aggressively. Numbers drive every architecture decision that follows.

03

High-Level Design (10 min)

Draw the big boxes: clients, API gateway, services, databases, caches, queues. Identify read vs write path. Breadth first — don't go deep yet.

04

Deep Dive (15 min)

Pick 1–2 most interesting components. Database schema, critical API design, specific algorithm. The part they're actually testing you on.

05

Bottlenecks (5 min)

Where is the hotspot? What breaks first at 10× scale? What trade-offs did you make and why?

06

Failure Scenarios (2 min)

DB goes down? Cache is cold? DC failover? What is the graceful degradation story?

07

Scale Evolution (optional)

What changes at 10× more load? Sharding strategy? CDN for static content? Read replicas?

Component Decision Guide

COMPONENT	USE WHEN	EXAMPLES
CDN	Static content, globally read-heavy, media files	Cloudflare, Akamai, AWS CloudFront
Load Balancer	Multiple backend instances, traffic distribution, SSL termination	AWS ALB/NLB, NGINX, HAProxy
Cache (Redis)	Hot reads, session storage, rate limiting counters, leaderboards	Redis, Memcached, DynamoDB DAX
Message Queue	Async processing, decouple services, event streaming, retry logic	Kafka, RabbitMQ, AWS SQS
SQL Database	ACID transactions, complex queries, structured data, joins	PostgreSQL, MySQL, AWS Aurora
NoSQL Database	High throughput, flexible schema, horizontal scale, simple access patterns	DynamoDB, Cassandra, MongoDB
Object Storage	Large files, images, videos, backups, low cost	AWS S3, GCS, Azure Blob
Search Engine	Full-text search, faceted filtering, fuzzy matching	Elasticsearch, OpenSearch, Algolia

Interview Tips — Common Mistakes

MISTAKE	CORRECTION
"CAP says choose 2 of 3"	CAP says choose C or A when a partition occurs. P is not optional — always required.
"Eventual consistency is always bad"	Deliberate trade-off. Facebook likes don't need strong consistency. Know when it's acceptable.
Jumping to DB choice first	Start with requirements → scale → access patterns → then DB choice follows naturally.
Not estimating scale	Every HLD starts with numbers. Scale determines architecture. Always estimate.
"Just add a cache"	Cache invalidation is one of the hardest problems. When do you invalidate? On write? TTL? Both?
Ignoring failure scenarios	Interviewers want to see: what happens when X fails? What's the recovery path?

01

CAP Theorem — 8 Scenario Analysis

~1 hr

›

For each scenario: identify CP or AP, justify your choice, and name the trade-off you're accepting.

Online banking — check account balance before debit
Facebook Like counter on a viral post
Uber driver location updates (moves every 4 seconds)
Hotel room reservation (last room available)
Amazon product reviews display
Stock trading platform — order placement
WhatsApp message delivery status (sent/delivered/read)
E-commerce shopping cart (items added by user)

For each: state the exact failure mode if you choose wrong (e.g., "if I choose AP for banking, a user could overdraft").

02

Back-of-Envelope — WhatsApp + YouTube

~1.5 hrs

›

WhatsApp: 2B users, 100M active daily, 100 messages/user/day, avg 100 bytes/message.
Calculate: peak QPS, storage/year, bandwidth (in + out).

YouTube: 2B users, 500 hours of video uploaded per minute, 1B views/day, avg view 10 min at 1 Mbps.
Calculate: upload storage/year, CDN bandwidth for views, approximate CDN cost (assume $0.01/GB).

For each: show all steps. Identify the biggest bottleneck revealed by your numbers.

03

Consistency Model Selection — 6 Scenarios

~1 hr

›

Choose the appropriate consistency model and justify:

Bank ledger — transfer between two accounts
User profile picture update
Social media comments — replies must appear after parent
Shopping cart — items added/removed
Distributed lock (exactly one service holds the lock)
Netflix "continue watching" progress position

For each: name the exact model (Linearizable / Sequential / Causal / Read-Your-Writes / Eventual), the implementation mechanism, and the failure mode if you under-constrain.

★

HLD Framework — URL Shortener (Full Walkthrough)

~2 hrs · full design

›

Apply all 7 steps to design TinyURL. No code — framework output only.

Scale: 300M URLs stored, 100:1 read:write ratio, 5-year retention
Latency target: redirect in <10ms (p99)
Availability: 99.99%

Required outputs:

Peak QPS (read + write)
Storage for 5 years
High-level diagram (boxes + arrows, read path + write path)
DB choice + justification (CAP + access pattern reasoning)
Key design decision: how do you generate unique 7-char short codes?
Biggest bottleneck + how to address it
Failure scenario: what if the DB is unreachable during a redirect?

0 / 12 completedMODULE B1 · HLD FUNDAMENTALS

Can explain CAP theorem without buzzwords — the partition choice framing

Know PACELC extension — how it covers non-partition latency/consistency

Distinguish: Linearizable vs Causal vs Eventual — when each is appropriate

Can calculate end-to-end availability from N services in sequence

Know Active-Passive vs Active-Active tradeoffs; split-brain prevention

Know all 6 load balancing algorithms + when to use each

L4 vs L7 load balancers — routing basis, SSL termination, smart routing

Memorized the latency numbers table (RAM vs SSD vs HDD vs network)

Can apply Little's Law to analyse queue depth from QPS + latency

Can do back-of-envelope for any system in <5 minutes

Know the 7-step HLD framework + time allocation for 45-min interview

✏️ All 4 tasks completed (CAP, estimation, consistency, URL shortener)

NEXT MODULE

B2 — Databases at Scale

Indexing strategies · SQL vs NoSQL trade-offs · Sharding (horizontal partitioning) · Replication (primary-replica, multi-primary) · ACID vs BASE · Read replicas · Connection pooling · Database selection guide