Twitter at Scale
The numbers that drive every architectural decision
| METRIC | VALUE | IMPLICATION |
|---|---|---|
| Total users | 300M | Social graph: 300M × avg 400 follows = 120B follow edges |
| DAU | 100M | Only 33% active daily — fanout to inactive users is wasteful |
| Tweets/day | 500M | 5,800/sec avg, 15,000/sec peak |
| Timeline reads/day | 28B | 320K reads/sec avg → 800K/sec peak |
| Read:Write ratio | ~50:1 | System is overwhelmingly read-heavy → cache everything |
| Avg followers | 200 | Fan-out write cost: 5,800 × 200 = 1.16M Redis writes/sec |
| Max followers | 100M+ | Celebrities break fan-out-on-write completely |
| Storage/tweet | ~1 KB | 500M/day × 1KB = 500 GB/day text only |
The hard constraint: A celebrity with 100M followers tweets once → fan-out on write = 100M Redis writes in seconds. At peak, that's 1.5 TRILLION writes/sec if all celebrities tweet simultaneously. This single fact forces the hybrid approach.
The Core Problem: Home Timeline
Why the naive approach fails — and what "fan-out" means
Why you can't just query at read timeMATH
// User U follows 500 people. Naive read-time approach: SELECT tweet_id FROM tweets WHERE user_id IN (followee_1, followee_2, ... followee_500) ORDER BY created_at DESC LIMIT 200; // Cost: 500 DB queries (or 1 massive IN query) per timeline load // 320,000 timeline reads/sec × 500 queries = 160,000,000 queries/sec // A well-tuned MySQL handles ~100,000 QPS → need 1,600 DB nodes // Merge latency: 500 streams × network roundtrip → 200–500ms → SLA violation // Conclusion: pure read-time fan-out is IMPOSSIBLE at Twitter scale // We must pre-compute (at least partially) the home timeline
Fan-out = distributing one event (a new tweet) to N destinations (follower timelines). The question is: do you fan-out at write time (push) or at read time (pull)?
Fan-Out on Write vs Fan-Out on Read
The fundamental trade-off — write amplification vs read amplification
Fan-Out on Write
PUSH MODEL — pre-compute timeline at write time
User A tweets → fanout service pushes tweet ID into every follower's Redis timeline immediately. Timeline read = instant O(1) sorted set lookup.
✓ Timeline read: O(1) Redis → instant
✓ Scales reads to millions/sec easily
✓ No per-user read complexity
✓ Scales reads to millions/sec easily
✓ No per-user read complexity
✗ Write amplification: 1 tweet × 100M followers
✗ Celebrities break this model entirely
✗ Wastes writes for inactive users
✗ Redis storage: 300M timelines
✗ Celebrities break this model entirely
✗ Wastes writes for inactive users
✗ Redis storage: 300M timelines
Fan-Out on Read
PULL MODEL — compute timeline at read time
No pre-computation. At read time, fetch recent tweets from all followees, merge by timestamp. Works perfectly for celebrities — their tweet is one DB write.
✓ No write amplification — 1 tweet = 1 write
✓ Celebrities work fine
✓ No wasted storage for inactive users
✓ Celebrities work fine
✓ No wasted storage for inactive users
✗ 500 DB queries per timeline read
✗ 320K reads/sec × 500 = 160M QPS → impossible
✗ Merge latency: 200–500ms
✗ Hot followee tables under load
✗ 320K reads/sec × 500 = 160M QPS → impossible
✗ Merge latency: 200–500ms
✗ Hot followee tables under load
Interview insight: Neither model works alone at Twitter's scale. The question "fan-out on write or read?" is a trap — the correct answer is always "it depends on follower count, and we use a hybrid."
Hybrid Approach ★
Fan-out on write for normal users · Fan-out on read for celebrities
// HYBRID RULE — threshold-based routing
followers < 10,000
→
Fan-out on WRITE — push tweet ID to all follower timelines in Redis immediately
followers ≥ 10,000
→
Fan-out on READ — tweet stored in DB only; injected at timeline read time
Timeline Read — Hybrid Merge
How timeline service assembles the feedPSEUDOCODE
function getHomeTimeline(userId, limit=200): // 1. Pre-computed portion (fan-out-on-write tweets) precomputed = redis.ZREVRANGE("timeline:" + userId, 0, limit * 2, WITHSCORES) // O(log N) — fast, covers all normal users the person follows // 2. Celebrity injection (fan-out-on-read portion) celebrities = socialGraph.getCelebrityFollowees(userId) // Typically <50 celebrities per user (manageable) celeb_tweets = [] for celeb in celebrities: recent = tweetCache.getRecentTweets(celeb.userId, n=50) celeb_tweets.extend(recent) // 50 celebrities × 50 tweets = 2,500 tweet fetches (cached in Redis) // 3. Merge by timestamp + deduplicate retweets merged = mergeSortedByTimestamp(precomputed, celeb_tweets) return merged[:limit] // Total latency: ~5–20ms (all Redis operations)
Fanout Worker Service
What happens when @normalUser (800 followers) tweetsFLOW
POST /tweet → [Tweet Service] → INSERT INTO tweets (tweet_id=snowflake(), user_id, content, ...) ✓ durable → HSET tweet:{id} userId content likeCount ... ✓ cached → Kafka.publish("tweet-created", {tweetId, userId, followerCount}) [Fanout Worker] consumes from Kafka: if followerCount < 10_000: followers = socialGraph.getFollowers(userId) // read from graph DB for followerId in followers: redis.ZADD("timeline:"+followerId, timestamp, tweetId) redis.ZREMRANGEBYRANK("timeline:"+followerId, 0, -1001) // keep top 1000 else: // Celebrity: just ensure tweet is in tweet cache, no fanout // Timeline reads will inject this lazily redis.HSET("tweet:"+tweetId, ...)
The elegance: Normal users (99.9% of accounts) get instant fan-out with manageable write cost. Celebrities get lazy injection with zero write amplification. The merge at read time costs ~50 celebrity fetches — all from Redis — adding only 1–2ms to timeline load.
Data Model
Tweets table · Follows table · Redis timeline cache
TABLE: tweets — sharded by user_idMySQL
COLUMN
TYPE
NOTES
tweet_id
BIGINT
Snowflake ID — encodes timestamp, no separate created_at needed
user_id
BIGINT
Author — shard key
content
VARCHAR(280)
Tweet text
media_ids
JSON
Array of S3 keys for images/video
reply_to_id
BIGINT
NULL if original tweet; FK to parent tweet
retweet_of
BIGINT
NULL if original; FK to retweeted tweet
like_count
BIGINT
Approximate — updated async via counter service
INDEX (user_id, tweet_id DESC) — user timeline query
TABLE: follows — social graph serviceMySQL / Graph DB
follower_id
BIGINT
Who is following (part of composite PK)
followee_id
BIGINT
Who is being followed
created_at
TIMESTAMP
When the follow happened
PRIMARY KEY (follower_id, followee_id)
INDEX (followee_id) — "who follows @ladygaga?" for fanout
INDEX (followee_id) — "who follows @ladygaga?" for fanout
Redis timeline cache designREDIS
// Key pattern: timeline:{userId} // Type: Sorted Set — score = tweet timestamp (epoch ms) // Members: tweet_id strings redis.ZADD("timeline:123", 1700000500000, "tweet_abc") redis.ZADD("timeline:123", 1700000400000, "tweet_def") // Read top 200: tweetIds = redis.ZREVRANGE("timeline:123", 0, 199) // O(log N + 200) // Batch hydrate (pipeline, single roundtrip): tweets = redis.PIPELINE { tweetIds.map(id => HGETALL("tweet:"+id)) } // Trim timeline to 1000 entries (memory bound): redis.ZREMRANGEBYRANK("timeline:123", 0, -1001) // keep newest 1000 // Memory: 300M users × 1000 IDs × 8 bytes = 2.4 TB → Redis cluster
Full Architecture
Six services — each with a distinct responsibility
// TWITTER SYSTEM ARCHITECTURE
CLIENT
Mobile / Web
──→
CDN Edge
static assets
──→
Load Balancer
L7 / HTTPS
SERVICES
Tweet Service
post / user timeline
Timeline Service
home feed assembly
Fanout Service
push to Redis timelines
Social Graph Svc
follow / unfollow
Search Service
Elasticsearch
Media Service
upload / transcode
ASYNC
Kafka
tweet-created
tweet-liked
user-followed
tweet-liked
user-followed
──→
Fanout Workers
consume tweet-created
push to timelines
push to timelines
+
Counter Workers
batch update
like/retweet counts
like/retweet counts
+
Search Indexer
Elasticsearch
write pipeline
write pipeline
STORAGE
Redis Cluster
timelines, tweet cache
counts (approximate)
counts (approximate)
MySQL (sharded)
tweets, users
sharded by user_id
sharded by user_id
Graph DB
follows edges
300M×400 = 120B
300M×400 = 120B
Elasticsearch
full-text search
hashtag index
hashtag index
S3 + CDN
photos, videos
HLS transcoded
HLS transcoded
Likes, Retweets & Follower Counts
Async counter aggregation — why you can't do synchronous increments at scale
Why synchronous UPDATE like_count failsMATH
// A viral tweet receives 5M likes in 10 minutes // = 8,333 likes/sec at peak // Naive: UPDATE tweets SET like_count = like_count + 1 WHERE tweet_id = ? // Problem: 8,333 concurrent UPDATE ops on SAME row = row-level lock contention // MySQL handles ~10K single-row updates/sec → this saturates the primary // And we have thousands of tweets being liked simultaneously // Solution: decouple write from increment
User likes
tweet
tweet
→
INSERT likes
(user,tweet,ts)
(user,tweet,ts)
immediate — dedup check
+
INCR Redis
like:{tweetId}
like:{tweetId}
instant display
+
Kafka publish
"tweet-liked"
"tweet-liked"
async
→
Counter Worker
batch UPDATE
every 30 sec
batch UPDATE
every 30 sec
Key insight: Users are shown the Redis count (approximate, updated in real-time via INCR). The DB count lags by up to 30 seconds. This is acceptable — Twitter shows "1.2M" not "1,234,567". The likes table is the source of truth for "did I like this?", not the count column.
Media Storage
S3 + CDN + transcoding pipeline — serving petabytes of images and video
Image and video pipelineFLOW
// IMAGE UPLOAD: Client → CDN upload endpoint → S3 (raw bucket) Tweet stores: media_ids: ["s3://tweets-raw/2024/01/img_abc.jpg"] CDN serves: https://pbs.twimg.com/media/img_abc.jpg // CDN cache hit rate: 99%+ for viral content // Without CDN: 100M impressions × 500KB = 50TB bandwidth from S3 → $$$ // VIDEO UPLOAD (async transcoding): Client → S3 raw → Lambda trigger → Transcoding worker Transcodes to HLS (HTTP Live Streaming) at multiple bitrates: 240p, 480p, 720p, 1080p Output → S3 transcoded bucket → CDN // HLS: browser fetches small segments (2-10sec), adapts bitrate to bandwidth
Trending Topics
Sliding window hashtag countingSTREAM PROCESSING
// Kafka stream: every tweet → extract hashtags Tweet: "Just watched #Oppenheimer, amazing! #movies" Extract: ["#Oppenheimer", "#movies"] // Flink/Storm: count per hashtag in 1-hour sliding window // Min-heap: maintain top-30 hashtags globally + per-region // Store in Redis sorted set: redis.ZINCRBY("trending:global", 1, "#Oppenheimer") redis.ZINCRBY("trending:US", 1, "#Oppenheimer") // Read top 10: redis.ZREVRANGE("trending:global", 0, 9, WITHSCORES) // Refresh: recalculate every 5 minutes // Geographic trending: separate sorted set per region
01
Fan-Out Cost Analysis
›
Calculate the write cost for each scenario:
- @ladygaga (100M followers) tweets once. Pure fan-out on write: how many Redis writes? At what rate (assuming 10s to fanout)?
- 100K users each follow @ladygaga. Timeline load for each. Pure fan-out on read: how many extra tweet fetches? Total QPS added?
- Hybrid model (10K follower threshold): what % of Twitter accounts qualify as "celebrity"? What % of total write cost does this save?
- Redis storage: 300M active users × 1000 timeline entries × 8 bytes. Compare to fan-out-on-read (no timeline storage).
02
Extended Data Model
›
Extend the schema and describe the fanout/read changes for:
- Twitter Lists — user creates a curated list of accounts; list has its own timeline
- Tweet threads — reply chain, show conversation in order
- Quote tweets — retweet with added comment (references original)
- Pinned tweet — always shown first on user profile page
03
Failure Scenario Design
›
- Fanout service crashes mid-fanout: 10M of 100M followers got the tweet, then crash. How do you ensure eventual consistency?
- Redis timeline cache completely wiped (OOM, cluster failure). 300M users with empty timelines hit the timeline service simultaneously.
- Social graph DB is down. Fanout service can't resolve followers. What happens to new tweets posted during the outage?
- Elasticsearch is unavailable. Users submit search queries. Design graceful degradation.
★
Full 45-Minute Design Simulation
›
Set a timer. On paper or whiteboard, design Twitter's home timeline from scratch using the 7-step framework. Must include:
- Requirements + NFRs with all key numbers
- Capacity estimation (tweets/day, QPS, storage)
- Full architecture diagram (all 6 services + storage layers)
- Fan-out model decision with hybrid threshold justification
- DB schema (tweets + follows + Redis timeline cache design)
- At least 2 edge cases (celebrity problem + one other)
- Scale evolution: what changes if Twitter 10× in size?
After completing, review against this module's content. What did you miss? What would you add?
0 / 14 completedMODULE B6 · TWITTER FEED
Twitter scale: 300M users, 500M tweets/day, 320K timeline reads/sec
Fan-out on write: how it works, write amplification problem, celebrity failure mode
Fan-out on read: how it works, 160M QPS impossibility, merge latency
Hybrid model: threshold (10K followers), celebrity inject at read time, merge logic
Can draw full architecture: 6 services + Kafka + Redis + MySQL + Graph DB + S3
Redis timeline cache: sorted set, score = timestamp, ZREVRANGE, trim to 1000
DB schema: tweets (sharded by user_id) + follows (indexed on followee_id)
Like/retweet counts: async Kafka → counter worker → Redis INCR + periodic DB flush
Media: S3 + CDN + HLS transcoding pipeline for video
Trending: sliding window, Flink, Redis sorted set, per-region
Search: Elasticsearch, async indexing via Kafka, hydrate tweet IDs from cache
✏️ Task 1: fan-out cost analysis with actual numbers
✏️ Task 3: failure scenarios — fanout crash, Redis wipe, graph DB down
✏️ Task 4: 45-min timed full design simulation completed
// NEXT MODULE
B7 — Design WhatsApp
Real-time messaging · WebSocket connection management
Message delivery guarantees (sent / delivered / read)
Group chats · Media sharing · End-to-end encryption overview
Presence (online/offline) · Message ordering · Offline queue
Message delivery guarantees (sent / delivered / read)
Group chats · Media sharing · End-to-end encryption overview
Presence (online/offline) · Message ordering · Offline queue