SYSTEM DESIGN MASTERY · TRACK B · MODULE B9 · WEEK 19 RATE LIMITING · TOKEN BUCKET · SLIDING WINDOW · REDIS LUA
Component Design · 5 Algorithms · Distributed Limiting

RATE
LIMITER

FIXED WINDOW · SLIDING WINDOW · TOKEN BUCKET
LEAKY BUCKET · REDIS LUA · HTTP 429 · MULTI-TIER
5
ALGORITHMS
O(1)
MEMORY TARGET
429
HTTP STATUS
B9
MODULE
Fixed Window
Sliding Window Log
Sliding Window Counter ★
Token Bucket
Leaky Bucket
Redis Lua
Why Rate Limit?
Five problems — one solution
DoS Protection
One bad actor exhausts server resources. Rate limiting prevents a single IP or user from dominating bandwidth.
Cost Control
Expensive APIs (OpenAI, SMS, email) must be gated. Unlimited access = unlimited cloud bill.
Fair Use
Shared infrastructure must be equitable. One user shouldn't starve others on a multi-tenant API.
Tier Enforcement
Free: 1K req/day. Pro: 10K. Enterprise: unlimited. Rate limiting is how SaaS products enforce pricing tiers.
Abuse Prevention
Brute force login: 5 attempts/min per IP. Credential stuffing: 100 req/hr per account.
QoS Guarantee
Downstream services have capacity limits. Rate limit upstream to protect the service from overload.
Fixed Window Counter
Simplest algorithm — and its fatal flaw
Fixed window: simple Redis INCR per time bucketREDIS
// Key: ratelimit:{userId}:{window_minute}
// Limit: 100 req/min

function checkLimit(userId) {
  window = Math.floor(Date.now() / 60000)   // minute bucket
  key = `rl:${userId}:${window}`
  
  count = redis.incr(key)
  if (count == 1) redis.expire(key, 120)  // set TTL on first request
  
  return count <= 100
}

// Memory: O(1) — just one counter per user
// Speed: one INCR + one EXPIRE (conditional)
⚠ THE BOUNDARY BURST ATTACK:

12:00:58 → user sends 100 requests (window A, last 2 seconds)
12:01:00 → window resets to 0
12:01:01 → user sends 100 more requests (window B, first second)

Result: 200 requests in 3 seconds — 2× the intended 100/min limit.
Fixed window counter CANNOT prevent this.
// BURST ATTACK TIMELINE — 200 requests in 3 seconds
Window A 12:00-12:01
100 req (last 2s)
Window B 12:01-12:02
100 req (first 1s)
200 requests arrive in under 3 seconds — rate limit completely bypassed at the window boundary
Sliding Window Algorithms
Log (exact) vs Counter (approximate) — production trade-off
Sliding Window Log (Exact)
Sorted set of timestamps — exact but memory-heavyREDIS
// Key: ratelimit:{userId} — Sorted Set, score = timestamp
function checkLimit(userId) {
  now     = Date.now()           // ms
  window  = 60000                // 60s in ms
  cutoff  = now - window

  redis.zremrangebyscore(`rl:${userId}`, 0, cutoff)  // remove old
  count = redis.zcard(`rl:${userId}`)               // count remaining

  if (count >= 100) return REJECT

  redis.zadd(`rl:${userId}`, now, now)             // log this request
  redis.expire(`rl:${userId}`, 120)              // TTL cleanup
  return ALLOW
}
// Memory: O(N) — stores EVERY request timestamp
// 100 req/min × 1M users = 100M Redis entries → expensive
Sliding Window Counter ★ (Recommended)
// WEIGHTED INTERPOLATION FORMULA
rate = prev_count × (1 − elapsed_fraction) + curr_count
Example: limit = 100 req/min, current time = 12:01:45 (45s into window)
elapsed_fraction = 45s / 60s = 0.75
prev_window (12:00–12:01): 80 requests
curr_window (12:01–12:02): 30 requests

estimated_rate = 80 × (1 − 0.75) + 30 = 20 + 30 = 50 → ALLOW ✓

Attack scenario: prev=100, curr=95 at T=12:01:01 (elapsed=0.017)
estimated_rate = 100 × 0.983 + 95 = 98.3 + 95 = 193 → REJECT ✓ (boundary burst caught!)
Sliding window counter — two Redis keys, O(1) memoryREDIS
function checkLimit(userId, limit, windowSec) {
  now     = Math.floor(Date.now() / 1000)
  bucket  = Math.floor(now / windowSec)
  elapsed = (now % windowSec) / windowSec

  keyCurr = `rl:${userId}:${bucket}`
  keyPrev = `rl:${userId}:${bucket - 1}`

  [prevCount, currCount] = redis.mget(keyPrev, keyCurr)
  prevCount = prevCount || 0
  currCount = currCount || 0

  rate = prevCount * (1 - elapsed) + currCount

  if (rate >= limit) return REJECT

  redis.incr(keyCurr)
  redis.expire(keyCurr, windowSec * 2)
  return ALLOW
}
// Memory: O(1) — only 2 counter keys per user at any time
// Accuracy: ~99.997% vs true sliding window (0.003% error)
Token Bucket & Leaky Bucket
When natural bursts are legitimate — or must be suppressed
Token Bucket
BURST-FRIENDLY · O(1) STATE
Bucket holds up to N tokens. Refills at R tokens/sec. Each request consumes 1 token. Empty bucket → reject. Full bucket allows N instant requests (burst).
capacity=20, rate=10/sec:
User idle 2s → bucket fills to 20
20 instant requests → all allowed (burst!)
21st → rejected
After 0.1s → 1 refilled → allow again
Leaky Bucket
CONSTANT RATE · QUEUE-BASED
Requests queue up in a bucket. Bucket "leaks" (processes) at constant rate R. If bucket full → drop. Output is always exactly R req/sec regardless of input.
Use for: payment processors,
SMS gateways, anything with
strict constant-rate downstream
✗ Adds latency (queued requests)
✗ Not for interactive APIs
Token bucket — Redis Lua for atomicityLUA
-- KEYS[1] = bucket key
-- ARGV[1] = capacity, ARGV[2] = rate/sec, ARGV[3] = now_ms

local capacity = tonumber(ARGV[1])
local rate     = tonumber(ARGV[2])
local now      = tonumber(ARGV[3])

local bucket   = redis.call("HMGET", KEYS[1], "tokens", "last")
local tokens   = tonumber(bucket[1] or capacity)
local last     = tonumber(bucket[2] or now)

local elapsed  = (now - last) / 1000   -- seconds
tokens = math.min(capacity, tokens + elapsed * rate)

if tokens >= 1 then
    tokens = tokens - 1
    redis.call("HMSET", KEYS[1], "tokens", tokens, "last", now)
    redis.call("EXPIRE", KEYS[1], 3600)
    return 1   -- ALLOW
else
    redis.call("HMSET", KEYS[1], "tokens", tokens, "last", now)
    return 0   -- REJECT
end
Algorithm Comparison
Choose by use case — not by complexity
ALGORITHMMEMORYBURSTACCURACYBEST FOR
Fixed WindowO(1)✗ Boundary burst~ApproximateSimple counters, non-critical endpoints
Sliding Window LogO(N) — per request✓ ExactExactLow-traffic strict APIs, small user base
Sliding Window Counter ★O(1)~✓ 99.997%~ExactProduction default for most APIs
Token BucketO(1)✓ Up to capacityExactAPIs where short bursts are legitimate
Leaky BucketO(queue)✗ QueuedExact outputConstant-rate downstream (payments, SMS)
Interview default: "I'd use Sliding Window Counter as the default — O(1) memory, no boundary burst vulnerability, 0.003% accuracy error which is negligible for rate limiting. For APIs where genuine burst traffic should be allowed (e.g., bulk file uploads), I'd switch to Token Bucket."
Redis Lua Scripts — Why Atomic?
The race condition that breaks naive implementations
Race condition without Lua — two threads, one counterRACE CONDITION
// Without Lua — BROKEN under concurrency:
Thread A: GET tokens → 1             // sees 1 token available
Thread B: GET tokens → 1             // also sees 1 token available
Thread A: SET tokens 0                 // decrements to 0, allows request
Thread B: SET tokens 0                 // also decrements to 0, ALSO allows request
// Result: 2 requests allowed when only 1 token existed!

// With Lua script — CORRECT:
// Redis executes Lua scripts atomically
// No other Redis command can interleave between Lua lines
// Equivalent to a Redis MULTI/EXEC transaction but faster

// Execute Lua script in application code:
const script = redis.createScript(luaCode);
const result = await script.eval(
  [`bucket:${userId}`],              // KEYS[1]
  [capacity, refillRate, Date.now()]  // ARGV[1,2,3]
);
Why not MULTI/EXEC (Redis transactions)? MULTI/EXEC doesn't support conditional logic — you can't say "if tokens > 0 then decrement, else reject" inside a transaction. Lua scripts can. For rate limiting, you always need the check-and-decrement to be conditional and atomic.
Distributed Rate Limiting
Multiple API servers — shared state or local approximation?
Centralized Redis ★
All servers read/write to same Redis. Consistent hashing: hash(userId) → specific Redis shard. Correct counts always.
✓ Accurate
✓ Simple consistency
✓ Works at 500K req/sec/node
✗ Redis is bottleneck + SPOF
✗ +0.5ms per request (network)
Mitigate: replicas + circuit breaker
Local Counters
Each server keeps in-memory counter. Set limit = total_limit / N servers. Periodic gossip sync. No Redis dependency.
✓ Zero latency overhead
✓ No Redis dependency
✓ Survives Redis failure
✗ Under-counts (each server only sees own traffic)
✗ Uneven traffic distribution allows over-limit
Use: approximate limiting acceptable
API Gateway
Apply rate limiting at NGINX/Kong/AWS API GW before request hits any app server. Gateway backed by Redis.
✓ Rejects before app server compute
✓ Centralized config (no app deploy)
✓ Works across heterogeneous backends
✗ Gateway is now critical path
✗ Less flexibility per-endpoint
Best for: org-wide API rate limiting
Fail-Open vs Fail-Closed
What happens when Redis is unavailable?FAILURE MODE
function checkRateLimit(userId) {
  try {
    return redis.checkLimit(userId)
  } catch (RedisUnavailable) {
    
    // Option A: Fail-Open (allow all)
    return ALLOW
    // Pro: service stays up, users unaffected
    // Con: during Redis outage, limits not enforced → potential abuse
    // Use for: internal services, non-critical limits
    
    // Option B: Fail-Closed (reject all)
    return REJECT
    // Pro: strict — no abuse during outage
    // Con: service degraded for ALL users when Redis is down
    // Use for: financial APIs, security-critical endpoints
    
    // Option C: Local fallback (temporary local counter)
    return localFallback.checkLimit(userId)  // in-memory, less accurate
    // Best of both worlds for most production systems
  }
}
HTTP 429 Response Design
Good API clients need these headers to behave correctly
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700000060 ← Unix timestamp when window resets
Retry-After: 42 ← seconds until safe to retry
Content-Type: application/json
Response body + good client behaviorJSON + CLIENT
// Response body:
{
  "error": "rate_limit_exceeded",
  "message": "100 requests per minute exceeded",
  "retry_after_seconds": 42,
  "limit": 100,
  "window": "1 minute"
}

// Good client behavior (using Retry-After):
if (response.status === 429) {
  retryAfter = response.headers.get('Retry-After')
  await sleep(retryAfter * 1000)
  return retry(request)  // not immediately — that hammers the server
}

// Include on EVERY response (not just 429):
// Allows clients to monitor their quota proactively
X-RateLimit-Limit:     100
X-RateLimit-Remaining: 73   ← still have 73 left
X-RateLimit-Reset:     1700000060
Multi-Tier Rate Limiting
Production systems apply multiple limits simultaneously
// REQUEST FLOWS THROUGH ALL TIERS — first violation wins (429)
CDN / Edge
IP Address
DDoS protection. Single IP flooding the system.
1,000 req/min per IP
API Gateway
User ID
Abuse prevention. One user monopolizing shared resources.
100 req/min per user
API Service
API Key / Tier
Quota enforcement. Free vs Pro vs Enterprise daily limits.
1K–∞ req/day per key
Endpoint
User + Endpoint
Resource protection. Expensive ops individually limited.
5/min (login), 10/hr (email)
Key design insight: Different tiers protect different concerns and are applied at different layers. IP limits live at the CDN (before hitting origin). User limits live at the gateway. Quota lives in the application. Each layer defends against a different threat model.
01
Boundary Burst Attack & Fix
~1 hr
  1. Draw a timeline showing the boundary burst attack: limit=100 req/min, attacker sends 100 at 12:00:58 and 100 at 12:01:01. Show the counter values at each second.
  2. Implement sliding window counter in JavaScript (no Redis — just the math logic with mocked prev/curr counts)
  3. Apply the same attack to the sliding window counter: prev=100, curr=95, elapsed=0.017 (1 second in). Does it correctly block?
  4. What is the worst-case over-limit that sliding window counter can allow? (hint: think about what happens at the start of a new window)
02
Token Bucket Implementation
~1.5 hrs
  1. Implement TokenBucket(capacity, refillRatePerSec) in Java with boolean tryConsume()
  2. Make it thread-safe (hint: synchronized block or AtomicLong + last refill timestamp)
  3. Test: capacity=10, rate=5/sec. Assert: 10 instant calls succeed, 11th fails, wait 0.2s, 12th succeeds
  4. How do you serialize this bucket to Redis so it survives server restarts? Write the HMSET schema and the Lua reconstruction logic.
  5. If the server was down for 1 hour and bucket was at 0 — what should happen on the next request? (cap at capacity, not at hours × rate)
03
Distributed Rate Limiter
~1.5 hrs
  1. 20 app servers, limit=100 req/min per user. Naive: each server enforces 100/20=5 req/min locally. What goes wrong? Give a specific failure scenario.
  2. Draw the sequence diagram for centralized Redis: Server A → Redis check → allow/reject → response
  3. Demonstrate the race condition: Servers A and B both see count=99, both increment. Show why Lua prevents this.
  4. Redis becomes unavailable for 60 seconds. Argue for both fail-open and fail-closed. Which do you choose and why?
  5. Design a circuit breaker for the Redis rate limiter: when Redis is down, switch to local approximate counter automatically
Add Rate Limiting to URL Shortener (B5 Integration)
~2 hrs

Revisit your URL shortener from Module B5. Add production-grade rate limiting:

  • POST /shorten: 100 req/hour per IP (abuse prevention)
  • GET /{code}: 1,000 req/min per IP (DDoS protection)
  • POST /shorten: 1,000 req/day per user account (quota)
  1. Design the Redis key schema for all three limits
  2. Which algorithm for each? (justify: fixed window vs sliding window counter vs token bucket)
  3. Where in the architecture is each limit applied? (CDN vs API gateway vs application code)
  4. Write the complete HTTP 429 response for a failed POST /shorten with all required headers
  5. A premium user should get 10,000 req/day instead of 1,000. How does the application look up the user's tier before applying the limit?
0 / 17 completedMODULE B9 · RATE LIMITER
Fixed Window: simple INCR — but boundary burst vulnerability
Sliding Window Log: exact, ZADD/ZREMRANGEBYSCORE — O(N) memory
Sliding Window Counter formula: prev×(1−elapsed) + curr
Sliding window counter: O(1) memory, 0.003% error, production default
Token Bucket: capacity (burst) + refill_rate (sustained) — HMSET in Redis
Leaky Bucket: constant output rate, queue-based, for downstream rate control
Algorithm comparison table — when to use each
Race condition: GET + SET is broken — need Lua for atomicity
Lua scripts are atomic in Redis — no interleaving possible
Distributed: centralized Redis (consistent hashing per user)
Fail-open vs fail-closed: know the trade-offs and when to choose each
HTTP 429 headers: X-RateLimit-Limit, Remaining, Reset, Retry-After
Multi-tier: IP (CDN) → user (gateway) → API key (app) → endpoint (app)
API gateway approach: NGINX / Kong / AWS API GW + Redis backend
✏️ Task 1: boundary burst attack + sliding window fix
✏️ Task 2: token bucket implementation (Java, thread-safe, Redis-serialized)
✏️ Task 4 (capstone): rate limiting added to URL shortener
// NEXT MODULE
B10 — Consistent Hashing & Service Discovery
Virtual nodes · Minimal key remapping on node add/remove
Load distribution · Hot spot prevention · Rendez-vous hashing
Service registry (Consul/ZooKeeper) · Health checks · DNS-based discovery