The Modulo Hashing Problem
Why hash(key) % N breaks when N changes
Modulo Hashing — BROKEN
3 nodes: shard = hash(key) % 3
"user:123" → hash=456 → 456%3 = Node 0
"user:456" → hash=789 → 789%3 = Node 2
Add 4th node (N=3 → N=4):
789 % 4 = 1 → WRONG (was Node 2)
111 % 4 = 3 → WRONG (was Node 0)
~75% of all keys remapped instantly.
75% cache miss rate → thundering herd.
"user:123" → hash=456 → 456%3 = Node 0
"user:456" → hash=789 → 789%3 = Node 2
Add 4th node (N=3 → N=4):
789 % 4 = 1 → WRONG (was Node 2)
111 % 4 = 3 → WRONG (was Node 0)
~75% of all keys remapped instantly.
75% cache miss rate → thundering herd.
Consistent Hashing — SOLVED
Ring: positions [0, 2³²)
Node A → pos 12 · Node B → pos 45 · Node C → pos 78
"user:123" → pos=20 → clockwise → Node B (45)
"user:456" → pos=60 → clockwise → Node C (78)
Add Node D at position 55:
Only keys in range (45, 55] remapped → ~1/N ≈ 25%.
25% vs 75% — 3× fewer cache misses
Node A → pos 12 · Node B → pos 45 · Node C → pos 78
"user:123" → pos=20 → clockwise → Node B (45)
"user:456" → pos=60 → clockwise → Node C (78)
Add Node D at position 55:
Only keys in range (45, 55] remapped → ~1/N ≈ 25%.
25% vs 75% — 3× fewer cache misses
The key number: Adding 1 node to an N-node cluster remaps ~1/(N+1) of all keys. For a 12-node cluster adding a 13th: only 7.7% of keys move. With modulo: 92.3% would move.
The Hash Ring
Nodes and keys share the same circular address space [0, 2³²)
// RING LEGEND
Physical nodes (A, B, C) placed by hash of node name
k:20 → clockwise → node at 45 = Node B
k:60 → clockwise → node at 78 = Node C
k:90 → clockwise → wraps to 12 = Node A
New Node D at pos 55: only keys in (45, 55] move.
Node A owns: (78, 12] (wraps around 0)
Node B owns: (12, 45]
Node C owns: (45, 78]
Node B owns: (12, 45]
Node C owns: (45, 78]
Virtual Nodes
K positions per physical node → uniform load distribution
1 Vnode/Node
High variance — random positions lead to unequal arc sizes
10 Vnodes/Node
Better — converging toward equal, still some variance
150 Vnodes/Node ★
Near-uniform. Cassandra default. Std deviation <5%
Virtual node benefits beyond load balancingKEY INSIGHT
// BENEFIT 1: Even load (K=150 vnodes → each node ≈ 1/N share) // BENEFIT 2: Adding node redistributes from ALL existing nodes evenly // BENEFIT 3: Heterogeneous capacity PowerfulNode (32 cores): 300 vnodes → handles 2× data StandardNode (16 cores): 150 vnodes → handles 1× data WeakNode ( 8 cores): 75 vnodes → handles 0.5× data // Cassandra: 256 vnodes default. Redis Cluster: 16,384 hash slots.
Java Implementation
TreeMap provides O(log N) clockwise walk via ceilingEntry()
ConsistentHashRingJAVA
public class ConsistentHashRing { private final TreeMap<Integer, String> ring = new TreeMap<>(); private final int VNODES = 150; private int hash(String key) { return Hashing.murmur3_32().hashString(key, UTF_8).asInt(); } public void addNode(String node) { for (int v = 0; v < VNODES; v++) ring.put(hash(node + "#" + v), node); } public void removeNode(String node) { for (int v = 0; v < VNODES; v++) ring.remove(hash(node + "#" + v)); } public String getNode(String key) { if (ring.isEmpty()) throw new IllegalStateException("No nodes"); Map.Entry<Integer, String> e = ring.ceilingEntry(hash(key)); return (e != null ? e : ring.firstEntry()).getValue(); } }
Real-World Use Cases
Where consistent hashing solves the node change problem
| SYSTEM | HOW IT USES CONSISTENT HASHING | SPECIFICS |
|---|---|---|
| Redis Cluster | 16,384 hash slots across nodes. CRC16(key) % 16,384 → slot → node. | Hash tags: {user_id} forces related keys to same slot. |
| Apache Cassandra | 256 vnodes/node. Token range owns ring slice. Reads/writes to token owner + replicas. | RF=3 → 3 consecutive ring nodes own each key. |
| Amazon DynamoDB | Consistent hashing for partition routing. Auto-scaling splits ranges. | Consistent hashing minimises rebalancing on partition add. |
| CDN (Akamai) | Consistent hashing within PoP across cache servers. | Same URL → same cache server → higher hit rate. |
| Sticky LB | Same user → same backend server → in-memory session cache works. | Server failure: only that server's users remapped. |
Service Discovery Patterns
How Service A finds Service B in a dynamic microservices environment
Client-Side Discovery
CLIENT QUERIES REGISTRY DIRECTLY
Client queries registry → gets instance list → load balances (round-robin/random) → connects directly.
✓ No extra network hop
✓ Client controls LB strategy
✓ Client controls LB strategy
✗ LB logic in every client
✗ Multiple language SDKs needed
✗ Multiple language SDKs needed
Server-Side Discovery ★
GATEWAY HANDLES ROUTING
Client calls gateway → gateway queries registry → gateway routes to healthy instance. Clients need no discovery logic.
✓ Simple clients
✓ Central LB policy
✓ Works across languages
✓ Central LB policy
✓ Works across languages
✗ Extra hop through gateway
✗ Gateway is critical path
✗ Gateway is critical path
DNS-Based Discovery
SERVICE = DNS A RECORD
Registry publishes IPs as DNS A records. Kubernetes CoreDNS handles this automatically.
✓ Standard protocol
✓ Kubernetes native
✓ Kubernetes native
✗ DNS TTL caching = stale IPs
✗ No health awareness in DNS
✗ No health awareness in DNS
Kubernetes DNS discoveryKUBERNETES
# Format: {service}.{namespace}.svc.cluster.local curl http://payment-service.default.svc.cluster.local:8080/charge # CoreDNS → ClusterIP → kube-proxy → healthy pod # Pod fails → Endpoints controller removes it → no traffic routed
Consul & ZooKeeper
Production service registries — choose by consistency needs
Consul — register + discover healthy instancesCONSUL HTTP API
PUT /v1/agent/service/register
{ "Name": "payment-service", "ID": "payment-1",
"Address": "10.0.1.23", "Port": 8080,
"Check": { "HTTP": "http://10.0.1.23:8080/health", "Interval": "10s" } }
GET /v1/health/service/payment-service?passing=true
// → returns healthy instances only
GET /v1/health/service/payment-service?passing=true&index=50&wait=30s
// → long-poll: blocks until change or timeout → client auto-refreshes
ZooKeeper — ephemeral znodesZOOKEEPER
create /services/payment/instance-1 "10.0.1.23:8080" [EPHEMERAL] // Ephemeral = auto-deleted on session expire (service crash) getChildren /services/payment [WATCH] // → client notified when instance created/deleted
| FEATURE | CONSUL | ZOOKEEPER | ETCD (k8s) |
|---|---|---|---|
| Consistency | Raft (strong) | ZAB (strong) | Raft (strong) |
| Use case | Service mesh, multi-DC | Locks, leader election | Kubernetes control plane |
| Health checks | Built-in (HTTP/TCP) | Session timeout only | Leases (TTL keys) |
Health Checks
Registry must know which instances are healthy
/health endpoint standardHTTP
// GET /health → 200 if healthy, 503 if not { "status": "healthy", "checks": { "database": "connected", "redis": "connected", "disk": "98% free" }, "version": "2.3.1" } // Registry sees 503 → removes from routing pool // Consul default: 2 consecutive failures → deregister // Check interval 10s → failure detected within 20–30s
Liveness vs Readiness (Kubernetes): Liveness = process alive? (restart if not). Readiness = ready for traffic? (remove from LB if not). Only readiness affects service discovery routing.
Gossip Protocol
Eventual consistency for cluster membership — O(log N) propagation
// GOSSIP PROPAGATION — 16 nodes, K=3 neighbors per round
Round 1:
N1★
N5★
N11★
N2
N3
3/16 nodes know
Round 2:
N1★
N5★
N11★
N2★
N7★
N14★
N9★
9/16 nodes know
Round 3:
ALL 16★
16/16 know · O(log₃ 16) ≈ 2.5 rounds
Gossip vs RaftCOMPARISON
// GOSSIP: eventually consistent, O(log N), no leader, scales to 1000s of nodes // Use for: cluster membership, failure detection // RAFT: strongly consistent, leader-based, quorum required, ~5-7 nodes practical // Use for: config store, leader election, distributed locks // Consul uses BOTH: // Gossip (SWIM) for membership + failure detection // Raft for KV store + service catalog consistency
01
Consistent Hash Ring Implementation
›
- Implement
ConsistentHashRingwithaddNode,removeNode,getNodeusing TreeMap - Test with 3 nodes and 1,000 random keys. Verify ~33% per node.
- Add a 4th node. Verify ~25% of keys remapped (not 75%).
- Remove a node. Verify only that node's keys remapped.
- What if two nodes hash to the same ring position? Handle collisions.
02
Virtual Node Distribution Analysis
›
- Simulate 3 nodes with 1, 10, 50, 150, 300 vnodes each.
- Hash 10,000 random keys in each configuration.
- Calculate std deviation across nodes for each vnode count.
- At what vnode count does std dev fall below 5%?
- At K=150, how does adding a 4th node compare to modulo hashing?
03
Service Discovery System Design
›
Design service discovery for 50 microservices, each with 3–10 instances:
- Client-side vs server-side discovery — argue both, pick one, justify.
- Pod starts at 9:00:00. What happens step-by-step until first request?
- Pod dies without SIGTERM at 9:05:00. When do errors stop? What is the gap?
- Network partition: 2 of 5 Consul nodes can't reach the other 3. Who serves?
- What does /health check? What thresholds trigger unhealthy?
★
Add Consistent Hashing to URL Shortener
›
URL shortener from B5 + B9 now has 5 Redis cache nodes:
- How many vnodes per Redis node? Justify.
- Redis node 3 fails. Miss rate impact vs modulo hashing?
- Add Redis node 6. What % of cached URLs need to be reloaded from DB?
- What would modulo hashing do when node 3 fails?
- Design zero-downtime transition from modulo to consistent hashing.
0 / 19 completedMODULE B10 · CONSISTENT HASHING
Modulo flaw: adding 1 node remaps ~N/(N+1) ≈ all keys
Hash ring: both nodes and keys mapped to [0, 2³²)
Clockwise walk: ceilingEntry() or firstEntry() to wrap
Adding node: only 1/(N+1) keys remapped
Virtual nodes: K positions per physical node → uniform distribution
Vnodes: 150 standard (Cassandra: 256, Redis: 16,384 slots)
Heterogeneous nodes: weighted vnodes (2× hardware → 2× vnodes)
TreeMap: addNode, removeNode, getNode — O(log N)
Use cases: Redis Cluster, Cassandra, CDN, sticky LB
Service discovery registry pattern: register, query, health check
Client-side vs server-side discovery trade-offs
DNS-based: Kubernetes CoreDNS, service.namespace.svc.cluster.local
Consul: HTTP register + health check + passing=true + watch
ZooKeeper: ephemeral znodes auto-deleted on session expire
Health check: 200 healthy / 503 unhealthy; 2-failure threshold
Liveness vs readiness — different purposes, different consequences
Gossip: O(log N) propagation, eventual consistency, no leader
✏️ Tasks 1–3: ring impl, vnode analysis, service discovery
✏️ Task 4 (capstone): consistent hashing added to URL shortener
// NEXT MODULE
B11 — ACID, Distributed Transactions & Saga
ACID properties · 2-Phase Commit · Distributed transactions
Saga pattern (choreography vs orchestration) · Compensating transactions
Outbox pattern · Idempotency · BASE vs ACID trade-offs
Saga pattern (choreography vs orchestration) · Compensating transactions
Outbox pattern · Idempotency · BASE vs ACID trade-offs