Traditional RDBMS
(single node)
HBase · ZooKeeper
MongoDB (strong)
Cassandra · DynamoDB
CouchDB · DNS
P → Partition: choose A or C (same as CAP) E → Else (no partition): choose Latency or Consistency PACELC(P:A/C ; E:L/C) Cassandra: PA/EL — available during partition; low latency normally DynamoDB: PA/EL — same profile as Cassandra HBase: PC/EC — consistent always; accepts higher latency Zookeeper: PC/EC — built for coordination, strong guarantees MySQL: PC/EC — ACID, consistent always
| SCENARIO | CHOOSE | REASON |
|---|---|---|
| Bank account balance check before debit | CP | Wrong balance → real financial harm |
| Facebook likes on viral post | AP | Approximate count is fine; accuracy not critical |
| Hotel room reservation (last room) | CP | Double-booking is catastrophic |
| User profile picture update | AP | Slightly stale photo is acceptable |
| Stock trade order placement | CP | Price must be accurate; regulatory requirement |
| DNS lookups | AP | Stale DNS record better than no answer |
Strong consistency (Paxos/Raft): Requires majority quorum before returning → adds 1+ network round trips Typical latency: 5–50ms extra per operation ✓ Correct always ✗ Slower Eventual consistency (async replication): Write returns immediately after local write → low latency Replicas catch up asynchronously ✓ Fast, available ✗ Reads may be stale (replication lag) Tunable consistency (Cassandra): Per-query consistency level: ONE, QUORUM, ALL QUORUM write + QUORUM read → strong consistency ONE write + ONE read → eventual Trade-off per operation based on use case
Normal: [Client] ──→ [Active Node] [Passive] (standby, synced)
Failover: [Client] ──→ [Passive Node] [Active] (dead/recovering)
Variants:
Hot standby: Passive running + synced → failover in seconds
Warm standby: Passive needs startup → minutes
Cold standby: Passive needs provisioning → minutes to hours
Challenge: Split-brain
Network partition → both nodes think they're active primary
Both accept writes → divergent, irreconcilable state
Prevention: Quorum (majority must agree) + Fencing tokens
Normal: [Client] ──→ [Load Balancer] ──→ Node A (active, serving)
──→ Node B (active, serving)
──→ Node C (active, serving)
All nodes handle reads AND writes simultaneously.
Conflict resolution required:
- Last-write-wins (timestamp)
- CRDTs (Conflict-free Replicated Data Types)
- Application-level merge logic
Used in: Cassandra, DynamoDB, Akamai CDN, most NoSQL at scale
| FEATURE | L4 — Transport Layer | L7 — Application Layer |
|---|---|---|
| Works on | TCP/UDP packets | HTTP/HTTPS requests |
| Routing basis | IP address + port | URL path, headers, cookies, body |
| Speed | Very fast (no content inspection) | Slower (parses full HTTP request) |
| SSL termination | No — passes through encrypted | Yes — decrypts once at LB |
| Smart routing | No — IP-based only | Yes — /api → API servers, /images → CDN |
| Examples | AWS NLB, HAProxy (L4 mode) | AWS ALB, NGINX, Envoy, Caddy |
| OPERATION | LATENCY | RELATIVE SCALE | NOTE |
|---|---|---|---|
| L1 cache reference | 0.5 ns | CPU register-speed | |
| L2 cache reference | 7 ns | 14× slower than L1 | |
| RAM access | 100 ns | Baseline for in-memory ops | |
| SSD random read | 0.1 ms | 1000× slower than RAM | |
| Network within datacenter | 0.5 ms | Same-AZ latency | |
| HDD random read | 10 ms | Mechanical seek time | |
| Intra-region (cross-AZ) | 1–5 ms | Same region, different AZ | |
| Cross-region (US → EU) | ~100 ms | Speed of light across Atlantic |
L = λ × W L = average items in system (queue depth) λ = arrival rate (throughput, requests/sec) W = average time in system (latency, seconds) Example: Service handles 100 req/sec (λ = 100) Average latency is 50ms (W = 0.05s) Avg concurrent requests: L = 100 × 0.05 = 5 concurrent requests Key insight: If latency grows (W↑) and arrival rate stays constant (λ=const), queue depth grows (L↑). Eventually queue overflows → system collapse. → Latency spikes are early warning signs of capacity problems.
Assumptions: 300M DAU Reads: 100 tweets/user/day Writes: 2 tweets/user/day Avg tweet size: ~1 KB (text + metadata) Read QPS: 300M × 100 ÷ 86,400 ≈ 350,000 reads/sec Peak (3×): ~1M reads/sec Write QPS: 300M × 2 ÷ 86,400 ≈ 7,000 writes/sec Peak (3×): ~21,000 writes/sec Storage (5 years): 300M × 2 tweets/day × 365 × 5 × 1 KB = 300M × 3,650 × 1,000 bytes ≈ 1.1 PB (tweets only, excluding media) Bandwidth: Reads: 1M req/s × 1 KB = 1 GB/sec Writes: 21K req/s × 1 KB ≈ 21 MB/sec
Character (ASCII): 1 byte Integer: 4 bytes Long: 8 bytes UUID: 16 bytes Timestamp: 4 bytes Image (compressed): 100 KB – 5 MB HD video 1 min: ~60 MB (H.264 compressed) 4K video 1 min: ~375 MB Units: 1 KB = 10³ 1 MB = 10⁶ 1 GB = 10⁹ 1 TB = 10¹² 1 PB = 10¹⁵ Rule of 86,400: 1 req/sec → 86,400 req/day ≈ 100K req/day Rule of 30M: 1 req/sec → ~2.5M req/month ≈ 30M req/year
| COMPONENT | USE WHEN | EXAMPLES |
|---|---|---|
| CDN | Static content, globally read-heavy, media files | Cloudflare, Akamai, AWS CloudFront |
| Load Balancer | Multiple backend instances, traffic distribution, SSL termination | AWS ALB/NLB, NGINX, HAProxy |
| Cache (Redis) | Hot reads, session storage, rate limiting counters, leaderboards | Redis, Memcached, DynamoDB DAX |
| Message Queue | Async processing, decouple services, event streaming, retry logic | Kafka, RabbitMQ, AWS SQS |
| SQL Database | ACID transactions, complex queries, structured data, joins | PostgreSQL, MySQL, AWS Aurora |
| NoSQL Database | High throughput, flexible schema, horizontal scale, simple access patterns | DynamoDB, Cassandra, MongoDB |
| Object Storage | Large files, images, videos, backups, low cost | AWS S3, GCS, Azure Blob |
| Search Engine | Full-text search, faceted filtering, fuzzy matching | Elasticsearch, OpenSearch, Algolia |
| MISTAKE | CORRECTION |
|---|---|
| "CAP says choose 2 of 3" | CAP says choose C or A when a partition occurs. P is not optional — always required. |
| "Eventual consistency is always bad" | Deliberate trade-off. Facebook likes don't need strong consistency. Know when it's acceptable. |
| Jumping to DB choice first | Start with requirements → scale → access patterns → then DB choice follows naturally. |
| Not estimating scale | Every HLD starts with numbers. Scale determines architecture. Always estimate. |
| "Just add a cache" | Cache invalidation is one of the hardest problems. When do you invalidate? On write? TTL? Both? |
| Ignoring failure scenarios | Interviewers want to see: what happens when X fails? What's the recovery path? |
For each scenario: identify CP or AP, justify your choice, and name the trade-off you're accepting.
- Online banking — check account balance before debit
- Facebook Like counter on a viral post
- Uber driver location updates (moves every 4 seconds)
- Hotel room reservation (last room available)
- Amazon product reviews display
- Stock trading platform — order placement
- WhatsApp message delivery status (sent/delivered/read)
- E-commerce shopping cart (items added by user)
For each: state the exact failure mode if you choose wrong (e.g., "if I choose AP for banking, a user could overdraft").
WhatsApp: 2B users, 100M active daily, 100 messages/user/day, avg 100 bytes/message.
Calculate: peak QPS, storage/year, bandwidth (in + out).
YouTube: 2B users, 500 hours of video uploaded per minute, 1B views/day, avg view 10 min at 1 Mbps.
Calculate: upload storage/year, CDN bandwidth for views, approximate CDN cost (assume $0.01/GB).
For each: show all steps. Identify the biggest bottleneck revealed by your numbers.
Choose the appropriate consistency model and justify:
- Bank ledger — transfer between two accounts
- User profile picture update
- Social media comments — replies must appear after parent
- Shopping cart — items added/removed
- Distributed lock (exactly one service holds the lock)
- Netflix "continue watching" progress position
For each: name the exact model (Linearizable / Sequential / Causal / Read-Your-Writes / Eventual), the implementation mechanism, and the failure mode if you under-constrain.
Apply all 7 steps to design TinyURL. No code — framework output only.
- Scale: 300M URLs stored, 100:1 read:write ratio, 5-year retention
- Latency target: redirect in <10ms (p99)
- Availability: 99.99%
Required outputs:
- Peak QPS (read + write)
- Storage for 5 years
- High-level diagram (boxes + arrows, read path + write path)
- DB choice + justification (CAP + access pattern reasoning)
- Key design decision: how do you generate unique 7-char short codes?
- Biggest bottleneck + how to address it
- Failure scenario: what if the DB is unreachable during a redirect?