Module B9 — Design a Rate Limiter

System Design Mastery Course | Track B: HLD | Week 19

🎯 Module Overview

Duration: 1 Week | Track: B — HLD Fundamentals Deep Dive
Prerequisites: B1–B8
Goal: Rate limiting is the most commonly asked infrastructure component in system design interviews. It appears as a standalone question (“design a rate limiter”) and as part of every API design question. This module covers all 5 algorithms with exact memory/accuracy trade-offs, Redis Lua implementation, and distributed coordination strategies.

1. Why Rate Limit?

Three primary motivations:

1. DDoS / Abuse Prevention
   Attacker sends 100K requests/sec to API
   Per-IP rate limit at CDN edge: attacker gets 429, backend sees zero traffic
   
2. Noisy Neighbour
   One API tenant bug: tight loop, 10K req/sec
   Per-API-key limits: buggy tenant throttled, others unaffected
   
3. Cost Control
   External API costs money per call (OpenAI, Google Maps)
   Bug sends 1M calls/min = $50K cloud bill overnight
   Outbound rate limiter: max 1,000 external calls/minute

2. Algorithm Comparison

Algorithm	Memory	Accuracy	Burst?	Best Use Case
Fixed Window	O(1)	~Low	2× at boundary	Internal services
Sliding Window Log	O(requests)	Exact	No	Financial APIs
Sliding Window Counter ★	O(1)	~99.7%	No boundary	Production default
Token Bucket	O(1)	Exact	Yes (up to B)	Burst-ok APIs
Leaky Bucket	O(queue)	Exact	No	Payment processors

3. Fixed Window Counter

Key: "rl:fixed:{userId}:{current_minute}"
Ops: INCR key → if count==1: EXPIRE key 60

Problem: Boundary attack
  00:59 — 100 requests (allowed, window ends)
  01:00 — 100 requests (allowed, new window)
  Total: 200 requests in 2 seconds ← 2× your intended rate

Use when: internal services, rough protection only
Avoid: any external-facing high-value API

4. Sliding Window Log

Data structure: Redis Sorted Set (score = timestamp)
Ops:
  ZADD key {now_ms} {now_ms}
  ZREMRANGEBYSCORE key 0 {now - window_ms}  ← remove old entries
  count = ZCARD key
  
Properties:
  Memory: O(requests in window) — 100 req/user × 1M users = 100M entries in Redis
  Accuracy: Exact — no boundary problem
  
When to use: financial APIs, operations where exact count is regulatory requirement
When not: high-volume consumer APIs (memory prohibitive)

5. Sliding Window Counter ★ (Production Default)

Key insight: approximate sliding count from 2 fixed windows

Keys: "rl:{userId}:{prev_minute}", "rl:{userId}:{curr_minute}"

Formula:
  elapsed_fraction = seconds_into_current_minute / 60.0
  effective_count = prev × (1 - elapsed_fraction) + curr
  
  if effective_count >= limit → reject
  else → INCR curr_key, allow

Example (at 30 seconds into minute):
  prev = 80, curr = 30, limit = 100
  effective = 80 × 0.5 + 30 = 70 → allowed

Boundary attack fails:
  00:59: 100 requests → prev = 100
  01:30: 60 requests cumulative → effective = 100×0.5 + 60 = 110 → BLOCKED ✓

Memory: O(2 keys per user) — same as fixed window!
Accuracy: Max error bounded by ~0.003% in practice

6. Token Bucket

Concept:
  Bucket capacity: B tokens (max burst)
  Refill rate: R tokens per second
  Each request consumes 1 token
  No tokens → reject request

Behaviour:
  Idle period → bucket fills to B tokens
  Burst arrives → can serve up to B requests instantly
  After burst → restores to steady R req/sec rate

Use when:
  Upload API: allow burst of 10 uploads then 1/minute
  Chat API: allow typing burst then throttle
  
Implementation: MUST use Redis Lua script for atomicity
  Without Lua: GET tokens, compute new tokens, SET tokens
  Race condition: two concurrent requests both read 1 token, both decrement to 0
  Both get allowed, but you actually had only 1 token → count wrong

7. Redis Lua Script (Critical Concept)

-- Sliding Window Counter — Lua for atomicity
local prev = tonumber(redis.call('get', KEYS[1])) or 0
local curr = tonumber(redis.call('get', KEYS[2])) or 0
local limit = tonumber(ARGV[1])
local fraction = tonumber(ARGV[2])

local count = math.floor(prev * (1.0 - fraction)) + curr

if count >= limit then
    return {0, count}  -- blocked
end

local new_count = redis.call('incr', KEYS[2])
if new_count == 1 then
    redis.call('expire', KEYS[2], ARGV[3])
end

return {1, count + 1}  -- allowed

WHY LUA? Redis Lua scripts execute atomically.
No concurrent execution during script run.
No race conditions possible.
Alternative (wrong): GET → compute → INCR in separate commands
  Two concurrent requests: both GET count=99, both allowed, count reaches 101

8. Distributed Rate Limiting

Problem: 10 gateway nodes, where does the counter live?

Option 1: Local counter per gateway
  Fast (no network), but:
  Effective limit = N × limit (N = gateway count)
  10 gateways × 100/min = 1000/min actual
  
Option 2: Centralised Redis ★ (production default)
  All gateways → one Redis cluster
  Latency: ~1ms per request (Redis round-trip)
  Accuracy: exact global count
  Sharding: consistent hashing on API key → each key on one shard
  Failure: fail open (allow) + circuit breaker
  
Option 3: Synchronised local counters
  Each gateway syncs to peers every 100ms
  Max overshoot: sync_interval × rate
  Use: non-critical limits, Redis unavailable scenario
  
Production choice: Centralised Redis with consistent hashing
  3–5 Redis cluster nodes
  Single Redis call per API request
  If shard fails → fail open (not fail closed)

9. HTTP 429 Response Standard

Always include these headers:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100            ← the limit
X-RateLimit-Remaining: 0          ← how many left
X-RateLimit-Reset: 1705334460     ← Unix timestamp of window reset
Retry-After: 47                   ← seconds until retry (REQUIRED)

Body: {"error": {"code": "RATE_LIMIT_EXCEEDED", "retryAfter": 47}}

On every successful request too:
X-RateLimit-Remaining: 73         ← lets clients self-throttle proactively

Why Retry-After matters:
  Without it: clients retry immediately → make overload WORSE
  With it: good clients wait → thundering herd problem self-resolves

10. Multi-Tier Architecture

Layer 1: CDN / Edge (per IP, ~10,000 req/min)
  → Catches DDoS before it reaches your infra
  → Cloudflare, AWS WAF, Akamai handle this
  
Layer 2: API Gateway (per API key, varies by tier)
  → Free: 100 req/min, Pro: 10,000 req/min  
  → Centralised Redis backing
  → This is what most "rate limiter" interview questions are about

Layer 3: Application Layer (per endpoint)
  → POST /search: 10 req/sec (expensive DB query)
  → POST /payment: 5 req/min (fraud prevention)
  → GET /profile: 1,000 req/min (cheap)

Layer 4: Database / External API (outbound limits)
  → Max 50 req/sec to Google Maps API
  → Circuit breaker: trips at 50% error rate

📝 Tasks

Task 1 — Algorithm Selection

Twitter “post tweet”: 300 per 3 hours. Which algorithm?
Payment API: must NEVER exceed 5/min (regulatory). Which algorithm?
File upload: burst of 10 then 1/minute. Which algorithm? What are B and R?
Demonstrate fixed window boundary attack with numbers
Prove sliding window counter max error is bounded

Task 2 — Redis Implementation

Implement Fixed Window Counter. Write exact Redis commands.
Implement Sliding Window Log. What data structure? Memory for 1M users × 100/min?
Show the race condition in Token Bucket without Lua. Concrete 2-request example.
Redis experiences 50ms latency spike — impact on API p99? How to handle?
Redis crashes — fail open (allow all) or fail closed (block all)? Justify.

Task 3 — Distributed Design

Design rate limiter for 10 gateway nodes, 100M API calls/day, per-key limits. Questions: Redis calls per request? Latency budget? Shard failure handling?

⭐ Task 4 — URL Shortener Rate Limiting

Add rate limiting to B5 URL shortener: POST /shorten: 10/min anonymous, 1,000/min authenticated GET /{code}: 1,000/min per IP DELETE /links/{id}: 5/min Global: 50,000 req/sec total

✅ Completion Checklist

→ Next: Module B10 — Consistent Hashing & Service Discovery

  ← B8 YouTube Notes
  ⚡ Interactive Module
  ↑ Roadmap
  NEXT: B10 Consistent Hashing →