SYSTEM DESIGN MASTERY · TRACK B · MODULE B10 · WEEK 20 CONSISTENT HASHING · VIRTUAL NODES · SERVICE DISCOVERY
Distributed Systems Fundamentals · Hash Ring · Registry

CONSISTENT
HASHING

HASH RING · VIRTUAL NODES · CONSUL · ZOOKEEPER
CLIENT-SIDE vs SERVER-SIDE · GOSSIP · HEALTH CHECKS
1/N
KEYS REMAPPED
2³²
RING SIZE
150
VNODES/NODE
B10
MODULE
Modulo Problem
Hash Ring
Virtual Nodes
Consul
ZooKeeper
Gossip Protocol
Health Checks
The Modulo Hashing Problem
Why hash(key) % N breaks when N changes
Modulo Hashing — BROKEN
3 nodes: shard = hash(key) % 3
"user:123" → hash=456 → 456%3 = Node 0
"user:456" → hash=789 → 789%3 = Node 2

Add 4th node (N=3 → N=4):
789 % 4 = 1 → WRONG (was Node 2)
111 % 4 = 3 → WRONG (was Node 0)

~75% of all keys remapped instantly.
75% cache miss rate → thundering herd.
Consistent Hashing — SOLVED
Ring: positions [0, 2³²)
Node A → pos 12 · Node B → pos 45 · Node C → pos 78

"user:123" → pos=20 → clockwise → Node B (45)
"user:456" → pos=60 → clockwise → Node C (78)

Add Node D at position 55:
Only keys in range (45, 55] remapped → ~1/N ≈ 25%.

25% vs 75% — 3× fewer cache misses
The key number: Adding 1 node to an N-node cluster remaps ~1/(N+1) of all keys. For a 12-node cluster adding a 13th: only 7.7% of keys move. With modulo: 92.3% would move.
The Hash Ring
Nodes and keys share the same circular address space [0, 2³²)
0 2³¹ 2³²÷2 A pos:12 B pos:45 C pos:78 k:20→B k:60→C k:90→A↩ D? pos:55
// RING LEGEND
Physical nodes (A, B, C) placed by hash of node name
k:20 → clockwise → node at 45 = Node B
k:60 → clockwise → node at 78 = Node C
k:90 → clockwise → wraps to 12 = Node A
New Node D at pos 55: only keys in (45, 55] move.
Node A owns: (78, 12] (wraps around 0)
Node B owns: (12, 45]
Node C owns: (45, 78]
Virtual Nodes
K positions per physical node → uniform load distribution
1 Vnode/Node
High variance — random positions lead to unequal arc sizes
10 Vnodes/Node
Better — converging toward equal, still some variance
150 Vnodes/Node ★
Near-uniform. Cassandra default. Std deviation <5%
Virtual node benefits beyond load balancingKEY INSIGHT
// BENEFIT 1: Even load (K=150 vnodes → each node ≈ 1/N share)
// BENEFIT 2: Adding node redistributes from ALL existing nodes evenly
// BENEFIT 3: Heterogeneous capacity
PowerfulNode (32 cores): 300 vnodes  → handles 2× data
StandardNode (16 cores): 150 vnodes  → handles 1× data
WeakNode     ( 8 cores):  75 vnodes  → handles 0.5× data
// Cassandra: 256 vnodes default. Redis Cluster: 16,384 hash slots.
Java Implementation
TreeMap provides O(log N) clockwise walk via ceilingEntry()
ConsistentHashRingJAVA
public class ConsistentHashRing {
    private final TreeMap<Integer, String> ring = new TreeMap<>();
    private final int VNODES = 150;

    private int hash(String key) {
        return Hashing.murmur3_32().hashString(key, UTF_8).asInt();
    }

    public void addNode(String node) {
        for (int v = 0; v < VNODES; v++) ring.put(hash(node + "#" + v), node);
    }

    public void removeNode(String node) {
        for (int v = 0; v < VNODES; v++) ring.remove(hash(node + "#" + v));
    }

    public String getNode(String key) {
        if (ring.isEmpty()) throw new IllegalStateException("No nodes");
        Map.Entry<Integer, String> e = ring.ceilingEntry(hash(key));
        return (e != null ? e : ring.firstEntry()).getValue();
    }
}
Real-World Use Cases
Where consistent hashing solves the node change problem
SYSTEMHOW IT USES CONSISTENT HASHINGSPECIFICS
Redis Cluster16,384 hash slots across nodes. CRC16(key) % 16,384 → slot → node.Hash tags: {user_id} forces related keys to same slot.
Apache Cassandra256 vnodes/node. Token range owns ring slice. Reads/writes to token owner + replicas.RF=3 → 3 consecutive ring nodes own each key.
Amazon DynamoDBConsistent hashing for partition routing. Auto-scaling splits ranges.Consistent hashing minimises rebalancing on partition add.
CDN (Akamai)Consistent hashing within PoP across cache servers.Same URL → same cache server → higher hit rate.
Sticky LBSame user → same backend server → in-memory session cache works.Server failure: only that server's users remapped.
Service Discovery Patterns
How Service A finds Service B in a dynamic microservices environment
Client-Side Discovery
CLIENT QUERIES REGISTRY DIRECTLY
Client queries registry → gets instance list → load balances (round-robin/random) → connects directly.
✓ No extra network hop
✓ Client controls LB strategy
✗ LB logic in every client
✗ Multiple language SDKs needed
Server-Side Discovery ★
GATEWAY HANDLES ROUTING
Client calls gateway → gateway queries registry → gateway routes to healthy instance. Clients need no discovery logic.
✓ Simple clients
✓ Central LB policy
✓ Works across languages
✗ Extra hop through gateway
✗ Gateway is critical path
DNS-Based Discovery
SERVICE = DNS A RECORD
Registry publishes IPs as DNS A records. Kubernetes CoreDNS handles this automatically.
✓ Standard protocol
✓ Kubernetes native
✗ DNS TTL caching = stale IPs
✗ No health awareness in DNS
Kubernetes DNS discoveryKUBERNETES
# Format: {service}.{namespace}.svc.cluster.local
curl http://payment-service.default.svc.cluster.local:8080/charge
# CoreDNS → ClusterIP → kube-proxy → healthy pod
# Pod fails → Endpoints controller removes it → no traffic routed
Consul & ZooKeeper
Production service registries — choose by consistency needs
Consul — register + discover healthy instancesCONSUL HTTP API
PUT /v1/agent/service/register
{ "Name": "payment-service", "ID": "payment-1",
  "Address": "10.0.1.23", "Port": 8080,
  "Check": { "HTTP": "http://10.0.1.23:8080/health", "Interval": "10s" } }

GET /v1/health/service/payment-service?passing=true
// → returns healthy instances only

GET /v1/health/service/payment-service?passing=true&index=50&wait=30s
// → long-poll: blocks until change or timeout → client auto-refreshes
ZooKeeper — ephemeral znodesZOOKEEPER
create /services/payment/instance-1  "10.0.1.23:8080"  [EPHEMERAL]
// Ephemeral = auto-deleted on session expire (service crash)
getChildren /services/payment [WATCH]
// → client notified when instance created/deleted
FEATURECONSULZOOKEEPERETCD (k8s)
ConsistencyRaft (strong)ZAB (strong)Raft (strong)
Use caseService mesh, multi-DCLocks, leader electionKubernetes control plane
Health checksBuilt-in (HTTP/TCP)Session timeout onlyLeases (TTL keys)
Health Checks
Registry must know which instances are healthy
/health endpoint standardHTTP
// GET /health → 200 if healthy, 503 if not
{ "status": "healthy", "checks": {
    "database": "connected",
    "redis":    "connected",
    "disk":     "98% free"
  }, "version": "2.3.1" }

// Registry sees 503 → removes from routing pool
// Consul default: 2 consecutive failures → deregister
// Check interval 10s → failure detected within 20–30s
Liveness vs Readiness (Kubernetes): Liveness = process alive? (restart if not). Readiness = ready for traffic? (remove from LB if not). Only readiness affects service discovery routing.
Gossip Protocol
Eventual consistency for cluster membership — O(log N) propagation
// GOSSIP PROPAGATION — 16 nodes, K=3 neighbors per round
Round 1:
N1★
N5★
N11★
N2
N3
3/16 nodes know
Round 2:
N1★
N5★
N11★
N2★
N7★
N14★
N9★
9/16 nodes know
Round 3:
ALL 16★
16/16 know · O(log₃ 16) ≈ 2.5 rounds
Gossip vs RaftCOMPARISON
// GOSSIP: eventually consistent, O(log N), no leader, scales to 1000s of nodes
// Use for: cluster membership, failure detection

// RAFT: strongly consistent, leader-based, quorum required, ~5-7 nodes practical
// Use for: config store, leader election, distributed locks

// Consul uses BOTH:
//   Gossip (SWIM) for membership + failure detection
//   Raft for KV store + service catalog consistency
01
Consistent Hash Ring Implementation
~2 hrs
  1. Implement ConsistentHashRing with addNode, removeNode, getNode using TreeMap
  2. Test with 3 nodes and 1,000 random keys. Verify ~33% per node.
  3. Add a 4th node. Verify ~25% of keys remapped (not 75%).
  4. Remove a node. Verify only that node's keys remapped.
  5. What if two nodes hash to the same ring position? Handle collisions.
02
Virtual Node Distribution Analysis
~1.5 hrs
  1. Simulate 3 nodes with 1, 10, 50, 150, 300 vnodes each.
  2. Hash 10,000 random keys in each configuration.
  3. Calculate std deviation across nodes for each vnode count.
  4. At what vnode count does std dev fall below 5%?
  5. At K=150, how does adding a 4th node compare to modulo hashing?
03
Service Discovery System Design
~1.5 hrs

Design service discovery for 50 microservices, each with 3–10 instances:

  1. Client-side vs server-side discovery — argue both, pick one, justify.
  2. Pod starts at 9:00:00. What happens step-by-step until first request?
  3. Pod dies without SIGTERM at 9:05:00. When do errors stop? What is the gap?
  4. Network partition: 2 of 5 Consul nodes can't reach the other 3. Who serves?
  5. What does /health check? What thresholds trigger unhealthy?
Add Consistent Hashing to URL Shortener
~2 hrs

URL shortener from B5 + B9 now has 5 Redis cache nodes:

  1. How many vnodes per Redis node? Justify.
  2. Redis node 3 fails. Miss rate impact vs modulo hashing?
  3. Add Redis node 6. What % of cached URLs need to be reloaded from DB?
  4. What would modulo hashing do when node 3 fails?
  5. Design zero-downtime transition from modulo to consistent hashing.
0 / 19 completedMODULE B10 · CONSISTENT HASHING
Modulo flaw: adding 1 node remaps ~N/(N+1) ≈ all keys
Hash ring: both nodes and keys mapped to [0, 2³²)
Clockwise walk: ceilingEntry() or firstEntry() to wrap
Adding node: only 1/(N+1) keys remapped
Virtual nodes: K positions per physical node → uniform distribution
Vnodes: 150 standard (Cassandra: 256, Redis: 16,384 slots)
Heterogeneous nodes: weighted vnodes (2× hardware → 2× vnodes)
TreeMap: addNode, removeNode, getNode — O(log N)
Use cases: Redis Cluster, Cassandra, CDN, sticky LB
Service discovery registry pattern: register, query, health check
Client-side vs server-side discovery trade-offs
DNS-based: Kubernetes CoreDNS, service.namespace.svc.cluster.local
Consul: HTTP register + health check + passing=true + watch
ZooKeeper: ephemeral znodes auto-deleted on session expire
Health check: 200 healthy / 503 unhealthy; 2-failure threshold
Liveness vs readiness — different purposes, different consequences
Gossip: O(log N) propagation, eventual consistency, no leader
✏️ Tasks 1–3: ring impl, vnode analysis, service discovery
✏️ Task 4 (capstone): consistent hashing added to URL shortener
// NEXT MODULE
B11 — ACID, Distributed Transactions & Saga
ACID properties · 2-Phase Commit · Distributed transactions
Saga pattern (choreography vs orchestration) · Compensating transactions
Outbox pattern · Idempotency · BASE vs ACID trade-offs