SYSTEM DESIGN MASTERY · TRACK B · MODULE B7 · WEEK 17 REAL-TIME MESSAGING · WEBSOCKETS · E2E ENCRYPTION
Case Study No. 3 · Real-Time Messaging · Delivery Receipts

Design
WhatsApp

WEBSOCKETS · CASSANDRA · PRESENCE SYSTEM
GROUP MESSAGING · MEDIA PIPELINE · S3 + CDN
2B
USERS
100B
MSG/DAY
1K
CHAT SERVERS
B7
MODULE
WebSocket
Cassandra
Session Store
Delivery Receipts
Presence
Group Fan-Out
S3 Media
Requirements
Establish scope — then everything follows from the constraints
Functional
1-on-1 messaging (text, media, emoji)
Group messaging (up to 1,024 members)
Delivery receipts (sent / delivered / read)
Online presence + "last seen"
Media sharing (images, video, audio)

OUT OF SCOPE: calls, disappearing msgs, payments
Non-Functional
2B users, 100M DAU
100B messages/day → 1.16M msg/sec
Group fan-out: 1 → up to 1,024 recipients
Delivery latency p99 < 500ms
Presence propagation p99 < 1 second
Availability: 99.99%
Durability: zero message loss
The core insight: 100M concurrent users × persistent WebSocket connection = 1,000 Chat Servers. Each server is stateful — it owns those connections. Routing a message means finding the exact server the recipient is connected to. That's the central routing problem.
Why WebSockets?
Comparing polling vs long-polling vs WebSockets
Short Polling
CLIENT ASKS EVERY 1 SECOND
100M users × 1 req/sec = 100M req/sec. Server overloaded. Most responses are empty. 1-second worst-case latency. Unacceptable.
Long Polling
HOLD CONNECTION UNTIL MESSAGE
Better than polling. Still stateless. Proxy timeouts force reconnects. Each reconnect requires re-authentication. Reconnect storms on server restart.
WebSockets ★
PERSISTENT BIDIRECTIONAL TCP
One connection per user. Server pushes instantly. Sub-10ms delivery. Heartbeat ping/pong keeps alive through NAT. Full-duplex — both sides send simultaneously.
WebSocket lifecyclePROTOCOL
// 1. HTTP Upgrade handshake
GET /ws HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

// Server responds:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade

// 2. TCP connection stays open — frames flow bidirectionally
Client → Server: {"type":"message","to":456,"content":"Hey!"}
Server → Client: {"type":"message","from":123,"content":"What's up?"}
Server → Client: {"type":"ack","msgId":7890,"status":"delivered"}

// 3. Heartbeat every 30s — keeps connection alive through NAT
Server → Client: PING
Client → Server: PONG

// 4. On disconnect: client reconnects, fetches offline messages via REST
GET /messages/offline?since="last_message_id"
Message Send / Receive Path
End-to-end: Alice sends "Hey!" → Bob receives in <500ms
// ALICE SENDS MESSAGE TO BOB — FULL PATH
1.
[Alice]
──WS──→
[Chat Server A]   Alice's persistent connection
2.
[Server A]
writes to
[Cassandra: messages table]   durable, message_id = Snowflake
3.
[Server A]
──WS──→
[Alice]   ACK: message saved ✓ (single grey tick)
4.
[Server A]
publishes
[Kafka: "messages"]   async routing
5.
[Router]
GET session:{bobId}
[Redis Session Store]   → "chat-server-C:8080"
6.
[Router]
HTTP POST
[Chat Server C]   forward message to Bob's server
7.
[Server C]
──WS──→
[Bob]   message delivered ✓✓ (double grey)
8.
[Bob]
opens chat →
sends READ receipt → Server C → Server A → Alice   ✓✓ blue
Offline Message Delivery
Bob is offline when message sent — then reconnectsFLOW
// Bob is offline: message stored in Cassandra (already done — Step 2 above)
// Also: store message_id in inbox:{bobId} sorted set
ZADD inbox:bob {message_id} {message_id}

// Bob reconnects (WebSocket upgrade):
// 1. REST call to fetch missed messages
GET /messages/offline?userId=bob&since={lastReadMessageId}

// 2. Server queries Cassandra for all conversations Bob participates in
// 3. Returns all messages with message_id > lastReadMessageId
// 4. Push to Bob's new WebSocket connection
// 5. Update last_read_message_id = latest received
// 6. Mark each message as delivered → route receipts back to senders
Message Delivery Receipts
Three-state system — the detail that separates good answers from great ones
SENT
Single grey tick.
Message saved to Chat Server & Cassandra.
Server ACKs to sender immediately.
Guarantee: message will not be lost.
✓✓
DELIVERED
Double grey tick.
Message reached recipient's device.
Recipient's client ACKs via WebSocket.
Server updates status, notifies sender.
✓✓
READ
Double blue tick.
Recipient opened the conversation.
Client sends READ event via WebSocket.
Routed back to sender. Privacy: can be disabled.
Group message receipt trackingSQL / CASSANDRA
-- Option A: counters on message row
ALTER TABLE messages ADD delivered_count INT DEFAULT 0;
ALTER TABLE messages ADD read_count INT DEFAULT 0;
-- ✓✓ shown when delivered_count = group_size
-- ✓✓ blue when read_count = group_size

-- Option B: per-recipient receipts table (more granular, scales better)
CREATE TABLE message_receipts (
    message_id    BIGINT,
    recipient_id  BIGINT,
    status        VARCHAR,  -- 'delivered' | 'read'
    updated_at    TIMESTAMP,
    PRIMARY KEY (message_id, recipient_id)
);
-- Query: SELECT COUNT(*) FROM message_receipts
--        WHERE message_id = X AND status = 'read'
-- Compare to group size → determine if all have read
Failure case: What if delivered-ACK is lost in transit? The sender doesn't get ✓✓ but the message was delivered. The recipient should re-send the ACK on next heartbeat or reconnect. Delivered status is idempotent — sending it twice is harmless.
Presence System
100M concurrent users sending heartbeats every 30s = 3.3M writes/sec
// PRESENCE LIFECYCLE — ALICE'S SESSION
T=0:00
CONNECT
SET presence:alice "online" EX 45  |  Chat Server registers session
T=0:30
HEARTBEAT
EXPIRE presence:alice 45  |  Refreshes TTL — keeps "online"
T=1:00
HEARTBEAT
EXPIRE presence:alice 45  |  Continuous 30s cadence
T=1:05
DISCONNECT
No explicit DEL — key expires in 45 - 5 = 40s
T=1:45
EXPIRED
Key gone → GET presence:alice returns NULL → "Last seen 1:05"
Presence read + subscriber-based pushREDIS
// Write: every 30s heartbeat via WebSocket
SETEX presence:{userId} 45 "online"   // 45s TTL > 30s heartbeat interval

// Read: check if user is online
String val = redis.get("presence:" + userId);
if (val != null) return "Online";
else return "Last seen at " + db.getLastSeen(userId);

// Scaling the writes:
// 100M active users × 1 write/30s = 3.3M writes/sec
// Redis Cluster: shard by hash(userId) across 10+ nodes
// Each node handles ~330K writes/sec → achievable

// Subscriber-based presence notifications (avoids fan-out to all contacts):
// Bob opens chat with Alice → subscribe to presence:{aliceId}
// Alice comes online → notify only active subscribers (open chat windows)
// NOT: notify all 300 of Alice's contacts (expensive, most don't care)
Group Messaging
1 sender → up to 1,024 recipients — fan-out at delivery time
Store Once, Route Many
Group message stored ONE TIME in Cassandra (conversation_id = group_id). Each member's inbox stores only the message_id (pointer). No data duplication.
messages table: 1 row
group_member inboxes: 1,024 message_id pointers
Storage: 1× message + 1,024× 8-byte pointers
Fan-Out at Delivery
Fan-out service (Kafka consumer) looks up online group members, finds their Chat Servers via Session Store, routes message to each. Offline members get message in inbox on reconnect.
Online (500 of 1,024): WS push immediately
Offline (524 of 1,024): inbox entry → REST on reconnect
Throughput: 1,000 groups × 1 msg/group/sec × 512 = 512K pushes/sec
Group message fan-out serviceJAVA
public void handleGroupMessage(GroupMessageEvent e) {
    // 1. Message already stored in Cassandra by Chat Server
    UUID groupId = e.groupId;
    long messageId = e.messageId;

    // 2. Fetch all group members (paginated from Cassandra/Redis)
    List<Long> members = groupService.getMembers(groupId);

    for (long memberId : members) {
        if (memberId == e.senderId) continue;  // skip sender

        String serverAddr = redis.get("session:" + memberId);

        if (serverAddr != null) {
            // Online: route to their Chat Server
            chatRouter.deliver(serverAddr, memberId, messageId);
        } else {
            // Offline: store in inbox for later delivery
            redis.zadd("inbox:" + memberId, messageId, messageId);
        }
    }
}
Data Models
Messages (Cassandra), Social Graph, Sessions (Redis)
TABLE: messages — CassandraPRIMARY KEY (conversation_id, message_id DESC)
COLUMN TYPE NOTES
conversation_id
UUID
Partition key — all messages in chat on same node
message_id
BIGINT
Clustering key DESC — Snowflake ID, newest first, embeds timestamp
sender_id
BIGINT
Who sent it
content
TEXT
Encrypted text (E2E: only decryptable on device)
media_url
TEXT
S3/CDN URL, NULL for text messages
message_type
VARCHAR
'text' | 'image' | 'video' | 'audio' | 'document'
status
VARCHAR
'sent' | 'delivered' | 'read' — updated by receipt events
QUERY "last 50 msgs": SELECT * FROM messages WHERE conversation_id = X LIMIT 50 → single partition ✓
ORDERING: message_id DESC → newest first, no sort needed ✓
PAGINATION: WHERE message_id < {cursor} LIMIT 50 → keyset pagination ✓
Media Upload Protocol
Client uploads directly to S3 — Chat Server never touches media bytesFLOW
// Step 1: Client requests pre-signed upload URL
POST /media/upload/presign
Response: {uploadUrl: "https://s3.../media/uuid.jpg?X-Amz-Signature=...", mediaId: "uuid"}

// Step 2: Client uploads directly to S3 (NOT through Chat Server)
PUT https://s3.amazonaws.com/wa-media/uuid.jpg
Content-Type: image/jpeg
Body: [encrypted image bytes]

// Step 3: Client sends message with media reference
WS: {type:"message", to:456, mediaId:"uuid", mediaType:"image", thumbnail:"base64..."}

// Step 4: Recipient downloads from CDN (not Chat Server)
GET https://cdn.wa.me/media/uuid.jpg  ← edge-cached, fast

// Benefits:
// Chat servers handle only ~200 byte WS frames (never MB of media)
// S3 + CDN handle bandwidth independently
// E2E encryption: client encrypts before upload, only recipient can decrypt
Scale & Estimation
Numbers that anchor every architectural decision
COMPONENTVALUECALCULATION
Chat Servers~1,000100M concurrent users ÷ 100K WS connections/server
Message throughput1.16M msg/sec100B msg/day ÷ 86,400
Text storage/day10 TB/day1.16M/sec × 100 bytes × 86,400
With Cassandra replication (3×)30 TB/day10 TB × 3 replicas
Cassandra nodes (5yr)~500 nodes30 TB/day × 365 × 5 = 55 PB ÷ 100 TB/node
Session Store (Redis)5 GB100M sessions × 50 bytes = 5 GB — fits 1 node
Presence writes/sec3.3M/sec100M users ÷ 30s heartbeat
Presence Redis nodes10+ nodes3.3M ops/sec ÷ 300K ops/node
Kafka throughput~1.2 GB/sec1.16M msg/sec × 1 KB avg × 3 replicas
Key numbers to say aloud: "1,000 Chat Servers for 100M concurrent WebSocket connections." · "Cassandra (conversation_id, message_id DESC) — single partition read for chat history." · "Presence writes at 3.3M/sec require a Redis Cluster, not a single node." · "Media never touches Chat Servers — S3 pre-signed URL + CDN."
01
WebSocket Connection Management
~1.5 hrs
  1. Alice opens WhatsApp. How does the app choose which Chat Server to connect to? (hint: load balancer with sticky sessions? consistent hashing?)
  2. Chat Server 47 crashes. 100K users lose their connections. What happens step-by-step? How long until they're reconnected?
  3. Bob's phone loses network for 60 seconds. What is queued where? Walk through the exact delivery sequence when he reconnects.
  4. WhatsApp Web: same account open on phone AND laptop. Design the multi-device connection model. How does a message reach both devices?
02
Delivery Receipt State Machine
~1 hr
  1. Draw the state machine for message status: valid states, valid transitions, triggering events
  2. Group of 500 members: when exactly do ✓✓ (delivered) and ✓✓ blue (read) show? All 500? First? Majority?
  3. Failure: message delivered to Bob, but "delivered" ACK lost in transit. How does the system eventually become consistent?
  4. Alice is offline when Bob's read receipt arrives. Where is it stored? When does Alice see the blue tick?
03
Presence System at 100M Users
~1.5 hrs

Design a presence system with these constraints:

  • Presence heartbeat every 30s from each online user
  • "Last seen" accurate to within 1 minute
  • Privacy: some users hide last seen entirely
  • Must handle 3.3M presence writes/sec
  • When Alice comes online, notify Bob (who has a chat open with Alice)

Design the Redis schema, write path, read path, and subscriber notification mechanism.

Full WhatsApp Design — 45-min Simulation
~3 hrs

Apply all 7 framework steps. Time to 45 minutes:

  1. Requirements + estimations: Chat Server count, Cassandra nodes, Redis cluster, Kafka throughput
  2. Full architecture diagram: all components, data flows
  3. Deep dive: WebSocket routing (Session Store lookup, Chat Server statefulness)
  4. Deep dive: group messaging fan-out at 1,024 members
  5. Deep dive: presence at 100M users (heartbeat + Redis TTL + subscriber push)
  6. Failure modes: Chat Server crash, Redis down, Cassandra shard failure
  7. Media pipeline: S3 pre-signed URL → CDN → E2E encryption mention
0 / 15 completedMODULE B7 · WHATSAPP
WebSockets vs polling: why WS, lifecycle, heartbeat, reconnect
Chat Server statefulness: owns connections, Session Store maps user→server
Full send path: Alice → Server A → Cassandra → Kafka → Router → Server C → Bob
Session Store: SET session:{userId} serverAddr EX 86400 on connect
Cassandra schema: (conversation_id, message_id DESC) — single partition chat history
Offline delivery: inbox sorted set + REST fetch on reconnect with cursor
3-state receipts: ✓ sent, ✓✓ delivered, ✓✓ blue read — state machine
Group receipts: message_receipts table with per-recipient status
Presence: SETEX 45s TTL + heartbeat 30s + subscribe-based notifications
Presence scale: 3.3M writes/sec → Redis Cluster (10+ nodes)
Group messaging: store once, route many, inbox for offline members
Media: S3 pre-signed URL (client uploads directly) + CDN delivery
Scale numbers: 1K Chat Servers, 30 TB/day, 3.3M presence writes/sec
✏️ Tasks 1–3: WS management, receipt state machine, presence design
✏️ Task 4 (capstone): full WhatsApp — 45-min interview simulation
// NEXT MODULE
B8 — Design YouTube
Video upload pipeline · Transcoding (HLS adaptive bitrate)
CDN video delivery · View counter · Recommendation engine overview
Search indexing · Comment system · Storage at petabyte scale