Requirements
Scope the system — what we build, what we don't
Functional
Upload video (raw file → processed variants)
Stream video (adaptive bitrate, no buffering)
Search by title, description, tags
View count (real-time approximate)
Like / dislike / comment
Subscribe + notifications on new uploads
OUT OF SCOPE: live streaming, ads, ML recommendations
Stream video (adaptive bitrate, no buffering)
Search by title, description, tags
View count (real-time approximate)
Like / dislike / comment
Subscribe + notifications on new uploads
OUT OF SCOPE: live streaming, ads, ML recommendations
Non-Functional
2B users/month, 800M DAU
500 hrs video uploaded every minute
1 billion hours watched daily
Video start: p99 < 2 seconds
Upload processing: < 5 min for 10-min video
Read:write ≈ 200:1
Durability: videos never lost (multi-replica)
500 hrs video uploaded every minute
1 billion hours watched daily
Video start: p99 < 2 seconds
Upload processing: < 5 min for 10-min video
Read:write ≈ 200:1
Durability: videos never lost (multi-replica)
The key constraint: Read:write is 200:1. Every architecture decision optimises for low-latency video delivery, not upload speed. The entire CDN and HLS strategy exists to serve this ratio.
Capacity Estimation
Numbers that anchor every infrastructure decision
| METRIC | VALUE | CALCULATION |
|---|---|---|
| Upload rate | 8.3 hrs/sec | 500 hrs/min ÷ 60 |
| Raw upload bandwidth | ~16.6 GB/sec | 8.3 hrs/sec × 2 GB/hr (1080p raw) |
| Processed variants per video | 6 resolutions | 360p, 480p, 720p, 1080p, 1440p, 4K |
| Processed storage per uploaded hr | ~3 GB | 6 × 500 MB avg per variant per hour |
| New video storage/day | 90 TB/day | 500 hrs/min × 60 min × 3 GB |
| New video storage/year | ~33 PB/year | 90 TB × 365 |
| Total YouTube storage | ~1 exabyte | 15+ years of uploads accumulated |
| Watch throughput | 11.6M hrs/sec | 1B hrs/day ÷ 86,400 |
| CDN bandwidth required | ~58 Tbps | 11.6M hrs/sec × 5 Mbps (1080p) |
Say aloud: "58 Tbps of CDN bandwidth. This is why YouTube has 1,000+ edge PoPs globally — no single datacenter could serve this. The CDN IS the product."
High-Level Architecture
Upload path (write-heavy) vs Stream path (read-heavy)
// UPLOAD PATH
1.
[Client]
chunked upload → [Upload Service] → raw file in [GCS/S3]
2.
[Upload Service]
publishes "video-uploaded" to [Kafka]
3.
[Kafka]
consumed by [Transcoding Workers] — K8s autoscaled pool
4.
[Transcoding]
parallel workers per resolution + temporal segmentation → .ts segments + .m3u8 manifest
5.
[Processed Files]
pushed to [GCS/S3], pre-warmed to [CDN edge nodes]
6.
[Metadata DB]
status updated: "processing" → "published"
// STREAM PATH (p99 < 2s start time)
1.
[Client]
GET /watch?v=abc123 → [Video Service] returns master .m3u8 URL
2.
[Client]
fetches master.m3u8 from [CDN Edge PoP] nearest to user (~5ms)
3.
[Player]
measures bandwidth → selects quality → fetches quality variant .m3u8
4.
[Player]
fetches first 3 segments (.ts files) from CDN → starts playback
5.
[Player]
continuously fetches next segments; switches quality as bandwidth changes
Upload Pipeline
Chunked resumable upload — resilient to network failures
Resumable upload protocolHTTP
// Step 1: Client initiates upload session POST /upload/initiate Body: {"filename": "video.mp4", "size": 2147483648, "sha256": "abc123..."} Response: {"uploadId": "up_xyz", "chunkSize": 5242880} ← 5 MB chunks // Step 2: Client checks for dedup (same SHA-256 already exists) POST /upload/check?hash=abc123 Response: {"exists": false} ← proceed with upload // If exists: {"exists": true, "videoId": "abc123"} → DONE, no upload needed! // Step 3: Upload chunks (can be parallelised) PUT /upload/up_xyz/chunk/0 Body: [bytes 0–5MB] → 200 OK PUT /upload/up_xyz/chunk/1 Body: [bytes 5MB–10MB] → 200 OK PUT /upload/up_xyz/chunk/2 Body: [bytes 10MB–15MB] → 500 (network drop) // Step 4: Resume from last successful chunk GET /upload/up_xyz/status → {"lastChunk": 1} PUT /upload/up_xyz/chunk/2 Body: [bytes 10MB–15MB] → 200 OK ← retry // Step 5: Finalize — triggers transcoding pipeline POST /upload/up_xyz/complete → {"videoId": "abc456", "status": "processing"}
Deduplication win: By checking SHA-256 before upload, re-uploaded content (same video uploaded twice by different users) is caught immediately. Saves both upload bandwidth and transcoding compute. YouTube uses this for copyright detection too.
Transcoding Pipeline
Converting raw upload to streamable HLS segments — the most compute-intensive step
Parallelism Strategy
Temporal parallelism — split video into 1-min segments, transcode in parallelARCHITECTURE
// Naive: one worker transcodes full 60-minute video // 60-min video at 1x realtime = 60 min transcoding time ← too slow // YouTube's approach: temporal parallelism // 1. Split raw video into 1-minute segments 60-minute video → 60 × 1-minute segments // 2. Dispatch each segment to a separate worker (60 workers) Worker 01: segment_01 → transcode to all 6 quality levels Worker 02: segment_02 → transcode to all 6 quality levels Worker 03: segment_03 → transcode to all 6 quality levels ... Worker 60: segment_60 → transcode to all 6 quality levels // 3. All workers run simultaneously → done in ~1 minute (60× speedup) // 4. Concatenate segments → complete HLS playlist per quality level // Per-resolution breakdown (for each segment): Worker A: 360p (fast — ~5 sec/segment) ← first available, serve to poor connections immediately Worker B: 480p (~8 sec/segment) Worker C: 720p (~15 sec/segment) Worker D: 1080p (~25 sec/segment) Worker E: 1440p (~40 sec/segment) Worker F: 4K (~60 sec/segment)
Codec Trade-offs
H.264 (AVC)
UNIVERSAL COMPAT
Every browser, device, smart TV supports it. Higher file size than VP9/AV1.
Use: default fallback
All non-Chrome clients
Smart TVs, older devices
All non-Chrome clients
Smart TVs, older devices
VP9
30-50% SMALLER THAN H.264
Google's codec. Excellent quality at lower bitrate. Native Chrome support (YouTube's primary client).
Use: Chrome browsers
YouTube default on web
Android devices
YouTube default on web
Android devices
AV1
NEXT-GEN — 50% SMALLER
Best quality:size ratio. CPU-intensive to decode (hardware accel needed). Future default.
Use: flagship devices
4K/8K content
Low-bandwidth markets
4K/8K content
Low-bandwidth markets
In practice: YouTube encodes both H.264 AND VP9 for every video. Client sends Accept-Encoding header → server delivers VP9 to Chrome, H.264 to everything else. Doubles storage cost but dramatically reduces CDN bandwidth (VP9 saves 30-50% per stream).
HLS Adaptive Bitrate Streaming
How the player automatically adjusts quality without buffering
// HLS FILE STRUCTURE — one video, multiple quality levels
master.m3u8
→
lists all quality variants with bandwidth info
360p.m3u8
→
seg_001.ts
seg_002.ts
seg_003.ts
... (2s each)
720p.m3u8
→
seg_001.ts
seg_002.ts
seg_003.ts
...
1080p.m3u8
→
seg_001.ts
seg_002.ts
seg_003.ts
...
master.m3u8 — what the player downloads firstHLS MANIFEST
#EXTM3U #EXT-X-STREAM-INF:BANDWIDTH=500000,RESOLUTION=640x360 https://cdn.youtube.com/v/abc123/360p.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720 https://cdn.youtube.com/v/abc123/720p.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080 https://cdn.youtube.com/v/abc123/1080p.m3u8 # Player logic (every 2 seconds): # measured bandwidth > 5 Mbps → switch to 1080p # measured bandwidth 2.5–5 Mbps → switch to 720p # measured bandwidth < 500 Kbps → switch to 360p # Switches happen at segment boundaries → no buffering
Why segments? Each .ts file is 2–10 seconds of independent video. The player can switch quality between segments — it doesn't need to re-buffer the current segment. This is what enables seamless quality switching as your network changes.
3-Tier CDN Architecture
58 Tbps served globally — the CDN IS the product
Edge PoPs
1,000+ nodes globally. Within 20ms of every major city. Hot content only (top 10% of videos = 90% of views). 10–100 TB storage each.
~5–20ms
~80% hit
Regional Cache
~100 nodes. Inter-regional. Warm content missed at edge. ~1 PB storage per node. Backed by higher-capacity hardware.
~30–60ms
~15% hit
Origin (GCS/S3)
Source of truth. Only ~5% of traffic reaches here. Multi-region replication. Exabyte-scale object storage.
~100–200ms
~5% miss
CDN Pre-Warming Strategy
Push popular content to edges before users request itSTRATEGY
// New video uploaded by channel with 10M subscribers: // Don't wait for cache misses — proactively push to edges // Tier 1: Subscriber count-based pre-warm if (channel.subscriberCount > 1_000_000) { cdn.prefetch(videoUrl, tier: "edge", regions: "all"); // push to all edges } else if (channel.subscriberCount > 100_000) { cdn.prefetch(videoUrl, tier: "regional"); // push to regional only } // Tier 2: Virality-based dynamic warm (triggered by view velocity) if (viewVelocity > 10_000 // views/minute) { cdn.prefetch(videoUrl, tier: "edge", regions: "all"); } // Pre-warm only first 5 segments (first 10–30 seconds) // Reason: most users watch the start; remaining segments warmed on demand // "Seek-ahead": when user is at segment N, pre-fetch N+1 to N+5
View Counter Design
11,600 views/sec globally — naive DB update doesn't scale
Redis INCR
FAST, APPROXIMATE
Atomic counter per video. Background job syncs to MySQL every 30s. Loses up to 30s of counts on Redis crash.
INCR view_count:{videoId}
Background: every 30s sync to DB
✓ Fast ✗ 30s loss risk
Background: every 30s sync to DB
✓ Fast ✗ 30s loss risk
Kafka + ClickHouse
ACCURATE, SCALABLE
Each view → Kafka event. ClickHouse consumer aggregates. Accurate, scalable, ~30-60s latency. Foundation for analytics.
Kafka: view_events (videoId, userId, ts)
ClickHouse: COUNT(*) per videoId
✓ Accurate ✗ 30-60s lag
ClickHouse: COUNT(*) per videoId
✓ Accurate ✗ 30-60s lag
Sharded Counters
FOR VIRAL VIDEOS
Shard count across N Redis keys. INCR random shard. Read = SUM all shards. Removes hot key problem for viral content.
INCR view:{videoId}:shard_{rand(N)}
READ: SUM view:{videoId}:shard_0..N
✓ No hotkey ✗ N reads per count
READ: SUM view:{videoId}:shard_0..N
✓ No hotkey ✗ N reads per count
Production view counter — combining all three approachesARCHITECTURE
// On every view event: // 1. INCR random Redis shard (instant, non-blocking) INCR view:{videoId}:shard_{random(10)} // 2. Publish to Kafka (async, doesn't block response) kafka.publish("view-events", {videoId, userId, ip, timestamp, country}) // View count displayed to user: GET /api/views/{videoId} → SUM MGET view:{videoId}:shard_0 ... view:{videoId}:shard_9 ← Redis (fast) // Analytics dashboard (historical, per-country, per-hour): SELECT COUNT(*) FROM view_events WHERE video_id = X AND timestamp > T → ClickHouse query ← accurate, supports complex aggregations // Spam prevention: // Check: SETNX view_dedup:{videoId}:{ip}:{hour} 1 EX 3600 // If key already exists → don't count this view (same IP, same hour)
Storage Strategy & Search
Petabyte object storage + Elasticsearch for discovery
GCS/S3 object structure + lifecycle policiesOBJECT STORAGE
// Storage layout: raw/{videoId}/original.mp4 ← deleted after transcoding (save cost) processed/{videoId}/master.m3u8 ← HLS master playlist processed/{videoId}/360p/seg_001.ts ← 2s video segments processed/{videoId}/1080p/seg_001.ts thumbnails/{videoId}/thumb_1.jpg ← multiple choices for uploader // Lifecycle policies (automated tiering by popularity): Hot (>100 views/month): Standard storage + CDN — fast and expensive Warm (10–100 views/month): Nearline — slightly slower, 50% cheaper Cold (<10 views/month): Coldline — retrieval delay, 80% cheaper Archive (<1 view/month): Archive — hours to retrieve, 95% cheaper // ~80% of all YouTube videos have fewer than 1K total views ever // Tiering long-tail to cold storage saves enormous cost // Replication: // GCS multi-region: automatic 3x replication across AZs // Cross-region: popular videos replicated to US, EU, APAC buckets
Search Architecture
Elasticsearch for video discoverySEARCH
// Elasticsearch index: "videos" { "videoId": "abc123", "title": "How to make pasta", ← full-text search "description": "Step by step guide...", ← full-text search "tags": ["cooking", "pasta", "recipe"], ← exact match "transcript": "Today we're making...", ← auto-generated captions "viewCount": 1500000, ← boost popular results "uploadDate": "2024-01-15", ← recency signal "channelSubscribers": 5000000 ← authority signal } // Keep ES in sync with MySQL: // MySQL change → Debezium CDC → Kafka topic "db-changes" → ES consumer // Async — ES may lag MySQL by seconds, acceptable for search freshness // Autocomplete: ES "search_as_you_type" field type on title // Returns suggestions after 2 characters
01
Transcoding Pipeline Design
›
- How many transcoding workers are needed for 8.3 hrs/sec input at 1× realtime? Show the math.
- Design the Kafka topic + partition key for the transcoding job queue (ordering? parallelism?)
- A 4-hour video is uploaded. Temporal parallelism splits it into 240 segments. How do you coordinate the concatenation step?
- A transcoding worker crashes halfway through segment 42 of 240. How do you resume without re-processing completed segments?
- Priority queue: paid YouTube Premium creators should process before free-tier. Design the priority mechanism.
02
CDN Caching Strategy
›
- New video from Mr. Beast (230M subscribers) — what happens in the first 10 minutes?
- Steady popular video: 500K views/day for 5 years. Where is it cached and what tier?
- Long-tail video: 2 views/month for 5 years. Should it even be on CDN?
- 50M people watch a live music event simultaneously. How does CDN handle the thundering herd?
- Cache invalidation: creator updates thumbnail after 1M views. How do you purge it from 1,000+ edge nodes?
03
View Counter at Scale
›
Design a view counter with these exact constraints:
- Viral video: 10M views in 1 hour (2,778 views/sec on a single video)
- Display count with <30 second lag
- Durability: must not lose counts (Redis crash shouldn't lose 30s of views)
- Spam prevention: same IP in same hour should only count once
- Analytics: views-per-hour breakdown for the last 30 days (for creator analytics)
For the viral video case: how many Redis nodes are needed? What's the sharding strategy?
★
Full YouTube Design — 45-min Simulation
›
Apply the 7-step framework. Time yourself to 45 minutes. Cover:
- Requirements + estimations: upload rate, storage growth, CDN bandwidth, transcoding throughput
- Full architecture diagram: upload path + stream path
- Chunked upload protocol + deduplication
- Transcoding: temporal parallelism math, codec choices
- HLS: manifest structure, adaptive bitrate switching
- CDN: 3-tier strategy + pre-warming
- View counter: sharded Redis + Kafka + ClickHouse
- Failure modes: transcoding worker crash, CDN node failure, GCS region outage
0 / 16 completedMODULE B8 · YOUTUBE
Requirements: upload, stream, search, view count, subscribe
Estimation: 90 TB/day new video, 58 Tbps CDN bandwidth
Chunked resumable upload: uploadId, 5 MB chunks, resume from last ACK
Pre-upload SHA-256 deduplication
Transcoding: temporal parallelism (1-min segments × N workers = 60× speedup)
HLS: master.m3u8 → quality variant .m3u8 → 2s .ts segments
Adaptive bitrate: player measures bandwidth every 2s, switches at segment boundaries
Codec trade-offs: H.264 (universal), VP9 (30-50% smaller, Chrome), AV1 (future)
3-tier CDN: edge PoPs (80% hit) → regional (15%) → GCS origin (5%)
CDN pre-warming: subscriber count + view velocity triggers edge prefetch
View counter: sharded Redis INCR + Kafka events + ClickHouse analytics
Spam prevention: SETNX dedup key per IP per hour
Storage lifecycle: hot → nearline → coldline → archive by view velocity
Search: Elasticsearch with title/description/transcript + CDC sync from MySQL
✏️ Tasks 1–3: transcoding pipeline, CDN strategy, view counter
✏️ Task 4 (capstone): full YouTube — 45-min interview simulation
// NEXT MODULE
B9 — Design a Rate Limiter
Token bucket · Leaky bucket · Fixed window · Sliding window log
Sliding window counter · Distributed rate limiting · Redis Lua scripts
Where to apply limits · API gateway integration · Edge rate limiting
Sliding window counter · Distributed rate limiting · Redis Lua scripts
Where to apply limits · API gateway integration · Edge rate limiting