SYSTEM DESIGN MASTERY · TRACK B · MODULE B8 · WEEK 18 VIDEO PLATFORM · TRANSCODING · CDN · PETABYTE STORAGE

Case Study No. 4 · Video Upload Pipeline · HLS Streaming

DESIGN
YOUTUBE

TRANSCODING · HLS ADAPTIVE BITRATE · 3-TIER CDN
VIEW COUNTER · OBJECT STORAGE · ELASTICSEARCH

500hrs

UPLOAD/MIN

1B hrs

WATCHED/DAY

58Tbps

CDN BANDWIDTH

MODULE

Chunked Upload

Transcoding

HLS / ABR

3-Tier CDN

View Counter

Elasticsearch

Object Storage

Requirements

Scope the system — what we build, what we don't

Functional

        Upload video (raw file → processed variants)

        Stream video (adaptive bitrate, no buffering)

        Search by title, description, tags

        View count (real-time approximate)

        Like / dislike / comment

        Subscribe + notifications on new uploads

        OUT OF SCOPE: live streaming, ads, ML recommendations

Non-Functional

        2B users/month, 800M DAU

        500 hrs video uploaded every minute

        1 billion hours watched daily

        Video start: p99 < 2 seconds

        Upload processing: < 5 min for 10-min video

        Read:write ≈ 200:1

        Durability: videos never lost (multi-replica)

The key constraint: Read:write is 200:1. Every architecture decision optimises for low-latency video delivery, not upload speed. The entire CDN and HLS strategy exists to serve this ratio.

Capacity Estimation

Numbers that anchor every infrastructure decision

METRIC	VALUE	CALCULATION
Upload rate	8.3 hrs/sec	500 hrs/min ÷ 60
Raw upload bandwidth	~16.6 GB/sec	8.3 hrs/sec × 2 GB/hr (1080p raw)
Processed variants per video	6 resolutions	360p, 480p, 720p, 1080p, 1440p, 4K
Processed storage per uploaded hr	~3 GB	6 × 500 MB avg per variant per hour
New video storage/day	90 TB/day	500 hrs/min × 60 min × 3 GB
New video storage/year	~33 PB/year	90 TB × 365
Total YouTube storage	~1 exabyte	15+ years of uploads accumulated
Watch throughput	11.6M hrs/sec	1B hrs/day ÷ 86,400
CDN bandwidth required	~58 Tbps	11.6M hrs/sec × 5 Mbps (1080p)

Say aloud: "58 Tbps of CDN bandwidth. This is why YouTube has 1,000+ edge PoPs globally — no single datacenter could serve this. The CDN IS the product."

High-Level Architecture

Upload path (write-heavy) vs Stream path (read-heavy)

// UPLOAD PATH

[Client]

chunked upload → [Upload Service] → raw file in [GCS/S3]

[Upload Service]

publishes "video-uploaded" to [Kafka]

[Kafka]

consumed by [Transcoding Workers] — K8s autoscaled pool

[Transcoding]

parallel workers per resolution + temporal segmentation → .ts segments + .m3u8 manifest

[Processed Files]

pushed to [GCS/S3], pre-warmed to [CDN edge nodes]

[Metadata DB]

status updated: "processing" → "published"

// STREAM PATH (p99 < 2s start time)

[Client]

GET /watch?v=abc123 → [Video Service] returns master .m3u8 URL

[Client]

fetches master.m3u8 from [CDN Edge PoP] nearest to user (~5ms)

[Player]

measures bandwidth → selects quality → fetches quality variant .m3u8

[Player]

fetches first 3 segments (.ts files) from CDN → starts playback

[Player]

continuously fetches next segments; switches quality as bandwidth changes

Upload Pipeline

Chunked resumable upload — resilient to network failures

Resumable upload protocolHTTP

// Step 1: Client initiates upload session
POST /upload/initiate
Body: {"filename": "video.mp4", "size": 2147483648, "sha256": "abc123..."}
Response: {"uploadId": "up_xyz", "chunkSize": 5242880}  ← 5 MB chunks

// Step 2: Client checks for dedup (same SHA-256 already exists)
POST /upload/check?hash=abc123
Response: {"exists": false}  ← proceed with upload
// If exists: {"exists": true, "videoId": "abc123"} → DONE, no upload needed!

// Step 3: Upload chunks (can be parallelised)
PUT /upload/up_xyz/chunk/0  Body: [bytes 0–5MB]       → 200 OK
PUT /upload/up_xyz/chunk/1  Body: [bytes 5MB–10MB]    → 200 OK
PUT /upload/up_xyz/chunk/2  Body: [bytes 10MB–15MB]   → 500 (network drop)

// Step 4: Resume from last successful chunk
GET /upload/up_xyz/status   → {"lastChunk": 1}
PUT /upload/up_xyz/chunk/2  Body: [bytes 10MB–15MB]   → 200 OK  ← retry

// Step 5: Finalize — triggers transcoding pipeline
POST /upload/up_xyz/complete → {"videoId": "abc456", "status": "processing"}

Deduplication win: By checking SHA-256 before upload, re-uploaded content (same video uploaded twice by different users) is caught immediately. Saves both upload bandwidth and transcoding compute. YouTube uses this for copyright detection too.

Transcoding Pipeline

Converting raw upload to streamable HLS segments — the most compute-intensive step

Parallelism Strategy

Temporal parallelism — split video into 1-min segments, transcode in parallelARCHITECTURE

// Naive: one worker transcodes full 60-minute video
// 60-min video at 1x realtime = 60 min transcoding time ← too slow

// YouTube's approach: temporal parallelism
// 1. Split raw video into 1-minute segments
60-minute video → 60 × 1-minute segments

// 2. Dispatch each segment to a separate worker (60 workers)
Worker 01: segment_01 → transcode to all 6 quality levels
Worker 02: segment_02 → transcode to all 6 quality levels
Worker 03: segment_03 → transcode to all 6 quality levels
...
Worker 60: segment_60 → transcode to all 6 quality levels

// 3. All workers run simultaneously → done in ~1 minute (60× speedup)
// 4. Concatenate segments → complete HLS playlist per quality level

// Per-resolution breakdown (for each segment):
Worker A: 360p  (fast — ~5 sec/segment)   ← first available, serve to poor connections immediately
Worker B: 480p  (~8 sec/segment)
Worker C: 720p  (~15 sec/segment)
Worker D: 1080p (~25 sec/segment)
Worker E: 1440p (~40 sec/segment)
Worker F: 4K    (~60 sec/segment)

Codec Trade-offs

H.264 (AVC)

UNIVERSAL COMPAT

Every browser, device, smart TV supports it. Higher file size than VP9/AV1.

Use: default fallback
All non-Chrome clients
Smart TVs, older devices

VP9

30-50% SMALLER THAN H.264

Google's codec. Excellent quality at lower bitrate. Native Chrome support (YouTube's primary client).

Use: Chrome browsers
YouTube default on web
Android devices

AV1

NEXT-GEN — 50% SMALLER

Best quality:size ratio. CPU-intensive to decode (hardware accel needed). Future default.

Use: flagship devices
4K/8K content
Low-bandwidth markets

In practice: YouTube encodes both H.264 AND VP9 for every video. Client sends Accept-Encoding header → server delivers VP9 to Chrome, H.264 to everything else. Doubles storage cost but dramatically reduces CDN bandwidth (VP9 saves 30-50% per stream).

HLS Adaptive Bitrate Streaming

How the player automatically adjusts quality without buffering

// HLS FILE STRUCTURE — one video, multiple quality levels

master.m3u8

→

lists all quality variants with bandwidth info

360p.m3u8

→

seg_001.ts

seg_002.ts

seg_003.ts

... (2s each)

720p.m3u8

→

seg_001.ts

seg_002.ts

seg_003.ts

...

1080p.m3u8

→

seg_001.ts

seg_002.ts

seg_003.ts

...

master.m3u8 — what the player downloads firstHLS MANIFEST

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=500000,RESOLUTION=640x360
https://cdn.youtube.com/v/abc123/360p.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
https://cdn.youtube.com/v/abc123/720p.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
https://cdn.youtube.com/v/abc123/1080p.m3u8

# Player logic (every 2 seconds):
# measured bandwidth > 5 Mbps  → switch to 1080p
# measured bandwidth 2.5–5 Mbps → switch to 720p
# measured bandwidth < 500 Kbps  → switch to 360p
# Switches happen at segment boundaries → no buffering

Why segments? Each .ts file is 2–10 seconds of independent video. The player can switch quality between segments — it doesn't need to re-buffer the current segment. This is what enables seamless quality switching as your network changes.

3-Tier CDN Architecture

58 Tbps served globally — the CDN IS the product

Edge PoPs

1,000+ nodes globally. Within 20ms of every major city. Hot content only (top 10% of videos = 90% of views). 10–100 TB storage each.

~5–20ms

~80% hit

Regional Cache

~100 nodes. Inter-regional. Warm content missed at edge. ~1 PB storage per node. Backed by higher-capacity hardware.

~30–60ms

~15% hit

Origin (GCS/S3)

Source of truth. Only ~5% of traffic reaches here. Multi-region replication. Exabyte-scale object storage.

~100–200ms

~5% miss

CDN Pre-Warming Strategy

Push popular content to edges before users request itSTRATEGY

// New video uploaded by channel with 10M subscribers:
// Don't wait for cache misses — proactively push to edges

// Tier 1: Subscriber count-based pre-warm
if (channel.subscriberCount > 1_000_000) {
    cdn.prefetch(videoUrl, tier: "edge", regions: "all");   // push to all edges
} else if (channel.subscriberCount > 100_000) {
    cdn.prefetch(videoUrl, tier: "regional");              // push to regional only
}

// Tier 2: Virality-based dynamic warm (triggered by view velocity)
if (viewVelocity > 10_000 // views/minute) {
    cdn.prefetch(videoUrl, tier: "edge", regions: "all");
}

// Pre-warm only first 5 segments (first 10–30 seconds)
// Reason: most users watch the start; remaining segments warmed on demand
// "Seek-ahead": when user is at segment N, pre-fetch N+1 to N+5

View Counter Design

11,600 views/sec globally — naive DB update doesn't scale

Redis INCR

FAST, APPROXIMATE

Atomic counter per video. Background job syncs to MySQL every 30s. Loses up to 30s of counts on Redis crash.

INCR view_count:{videoId}
Background: every 30s sync to DB
✓ Fast ✗ 30s loss risk

Kafka + ClickHouse

ACCURATE, SCALABLE

Each view → Kafka event. ClickHouse consumer aggregates. Accurate, scalable, ~30-60s latency. Foundation for analytics.

Kafka: view_events (videoId, userId, ts)
ClickHouse: COUNT(*) per videoId
✓ Accurate ✗ 30-60s lag

Sharded Counters

FOR VIRAL VIDEOS

Shard count across N Redis keys. INCR random shard. Read = SUM all shards. Removes hot key problem for viral content.

INCR view:{videoId}:shard_{rand(N)}
READ: SUM view:{videoId}:shard_0..N
✓ No hotkey ✗ N reads per count

Production view counter — combining all three approachesARCHITECTURE

// On every view event:
// 1. INCR random Redis shard (instant, non-blocking)
INCR view:{videoId}:shard_{random(10)}

// 2. Publish to Kafka (async, doesn't block response)
kafka.publish("view-events", {videoId, userId, ip, timestamp, country})

// View count displayed to user:
GET /api/views/{videoId}
  → SUM MGET view:{videoId}:shard_0 ... view:{videoId}:shard_9  ← Redis (fast)

// Analytics dashboard (historical, per-country, per-hour):
SELECT COUNT(*) FROM view_events WHERE video_id = X AND timestamp > T
  → ClickHouse query  ← accurate, supports complex aggregations

// Spam prevention:
// Check: SETNX view_dedup:{videoId}:{ip}:{hour} 1  EX 3600
// If key already exists → don't count this view (same IP, same hour)

Storage Strategy & Search

Petabyte object storage + Elasticsearch for discovery

GCS/S3 object structure + lifecycle policiesOBJECT STORAGE

// Storage layout:
raw/{videoId}/original.mp4                   ← deleted after transcoding (save cost)
processed/{videoId}/master.m3u8               ← HLS master playlist
processed/{videoId}/360p/seg_001.ts           ← 2s video segments
processed/{videoId}/1080p/seg_001.ts
thumbnails/{videoId}/thumb_1.jpg             ← multiple choices for uploader

// Lifecycle policies (automated tiering by popularity):
Hot   (>100 views/month):  Standard storage + CDN — fast and expensive
Warm  (10–100 views/month): Nearline — slightly slower, 50% cheaper
Cold  (<10 views/month):   Coldline — retrieval delay, 80% cheaper
Archive (<1 view/month):   Archive — hours to retrieve, 95% cheaper

// ~80% of all YouTube videos have fewer than 1K total views ever
// Tiering long-tail to cold storage saves enormous cost

// Replication:
// GCS multi-region: automatic 3x replication across AZs
// Cross-region: popular videos replicated to US, EU, APAC buckets

Search Architecture

Elasticsearch for video discoverySEARCH

// Elasticsearch index: "videos"
{
  "videoId": "abc123",
  "title": "How to make pasta",            ← full-text search
  "description": "Step by step guide...",    ← full-text search
  "tags": ["cooking", "pasta", "recipe"],  ← exact match
  "transcript": "Today we're making...",     ← auto-generated captions
  "viewCount": 1500000,                      ← boost popular results
  "uploadDate": "2024-01-15",               ← recency signal
  "channelSubscribers": 5000000             ← authority signal
}

// Keep ES in sync with MySQL:
// MySQL change → Debezium CDC → Kafka topic "db-changes" → ES consumer
// Async — ES may lag MySQL by seconds, acceptable for search freshness

// Autocomplete: ES "search_as_you_type" field type on title
// Returns suggestions after 2 characters

Transcoding Pipeline Design

~1.5 hrs

›

How many transcoding workers are needed for 8.3 hrs/sec input at 1× realtime? Show the math.
Design the Kafka topic + partition key for the transcoding job queue (ordering? parallelism?)
A 4-hour video is uploaded. Temporal parallelism splits it into 240 segments. How do you coordinate the concatenation step?
A transcoding worker crashes halfway through segment 42 of 240. How do you resume without re-processing completed segments?
Priority queue: paid YouTube Premium creators should process before free-tier. Design the priority mechanism.

CDN Caching Strategy

~1 hr

›

New video from Mr. Beast (230M subscribers) — what happens in the first 10 minutes?
Steady popular video: 500K views/day for 5 years. Where is it cached and what tier?
Long-tail video: 2 views/month for 5 years. Should it even be on CDN?
50M people watch a live music event simultaneously. How does CDN handle the thundering herd?
Cache invalidation: creator updates thumbnail after 1M views. How do you purge it from 1,000+ edge nodes?

View Counter at Scale

~1.5 hrs

›

Design a view counter with these exact constraints:

Viral video: 10M views in 1 hour (2,778 views/sec on a single video)
Display count with <30 second lag
Durability: must not lose counts (Redis crash shouldn't lose 30s of views)
Spam prevention: same IP in same hour should only count once
Analytics: views-per-hour breakdown for the last 30 days (for creator analytics)

For the viral video case: how many Redis nodes are needed? What's the sharding strategy?

★

Full YouTube Design — 45-min Simulation

~3 hrs

›

Apply the 7-step framework. Time yourself to 45 minutes. Cover:

Requirements + estimations: upload rate, storage growth, CDN bandwidth, transcoding throughput
Full architecture diagram: upload path + stream path
Chunked upload protocol + deduplication
Transcoding: temporal parallelism math, codec choices
HLS: manifest structure, adaptive bitrate switching
CDN: 3-tier strategy + pre-warming
View counter: sharded Redis + Kafka + ClickHouse
Failure modes: transcoding worker crash, CDN node failure, GCS region outage

0 / 16 completedMODULE B8 · YOUTUBE

Requirements: upload, stream, search, view count, subscribe

Estimation: 90 TB/day new video, 58 Tbps CDN bandwidth

Chunked resumable upload: uploadId, 5 MB chunks, resume from last ACK

Pre-upload SHA-256 deduplication

Transcoding: temporal parallelism (1-min segments × N workers = 60× speedup)

HLS: master.m3u8 → quality variant .m3u8 → 2s .ts segments

Adaptive bitrate: player measures bandwidth every 2s, switches at segment boundaries

Codec trade-offs: H.264 (universal), VP9 (30-50% smaller, Chrome), AV1 (future)

3-tier CDN: edge PoPs (80% hit) → regional (15%) → GCS origin (5%)

CDN pre-warming: subscriber count + view velocity triggers edge prefetch

View counter: sharded Redis INCR + Kafka events + ClickHouse analytics

Spam prevention: SETNX dedup key per IP per hour

Storage lifecycle: hot → nearline → coldline → archive by view velocity

Search: Elasticsearch with title/description/transcript + CDC sync from MySQL

✏️ Tasks 1–3: transcoding pipeline, CDN strategy, view counter

✏️ Task 4 (capstone): full YouTube — 45-min interview simulation

// NEXT MODULE

B9 — Design a Rate Limiter

      Token bucket · Leaky bucket · Fixed window · Sliding window log

      Sliding window counter · Distributed rate limiting · Redis Lua scripts

      Where to apply limits · API gateway integration · Edge rate limiting