SYSTEM DESIGN MASTERY · TRACK B · MODULE B9 · WEEK 19 RATE LIMITING · TOKEN BUCKET · SLIDING WINDOW · REDIS LUA

Component Design · 5 Algorithms · Distributed Limiting

RATE
LIMITER

FIXED WINDOW · SLIDING WINDOW · TOKEN BUCKET
LEAKY BUCKET · REDIS LUA · HTTP 429 · MULTI-TIER

ALGORITHMS

O(1)

MEMORY TARGET

429

HTTP STATUS

MODULE

Fixed Window

Sliding Window Log

Sliding Window Counter ★

Token Bucket

Leaky Bucket

Redis Lua

Why Rate Limit?

Five problems — one solution

DoS Protection

One bad actor exhausts server resources. Rate limiting prevents a single IP or user from dominating bandwidth.

Cost Control

Expensive APIs (OpenAI, SMS, email) must be gated. Unlimited access = unlimited cloud bill.

Fair Use

Shared infrastructure must be equitable. One user shouldn't starve others on a multi-tenant API.

Tier Enforcement

Free: 1K req/day. Pro: 10K. Enterprise: unlimited. Rate limiting is how SaaS products enforce pricing tiers.

Abuse Prevention

Brute force login: 5 attempts/min per IP. Credential stuffing: 100 req/hr per account.

QoS Guarantee

Downstream services have capacity limits. Rate limit upstream to protect the service from overload.

Fixed Window Counter

Simplest algorithm — and its fatal flaw

Fixed window: simple Redis INCR per time bucketREDIS

// Key: ratelimit:{userId}:{window_minute}
// Limit: 100 req/min

function checkLimit(userId) {
  window = Math.floor(Date.now() / 60000)   // minute bucket
  key = `rl:${userId}:${window}`
  
  count = redis.incr(key)
  if (count == 1) redis.expire(key, 120)  // set TTL on first request
  
  return count <= 100
}

// Memory: O(1) — just one counter per user
// Speed: one INCR + one EXPIRE (conditional)

⚠ THE BOUNDARY BURST ATTACK:

12:00:58 → user sends 100 requests (window A, last 2 seconds)
12:01:00 → window resets to 0
12:01:01 → user sends 100 more requests (window B, first second)

Result: 200 requests in 3 seconds — 2× the intended 100/min limit.
Fixed window counter CANNOT prevent this.

// BURST ATTACK TIMELINE — 200 requests in 3 seconds

Window A 12:00-12:01
100 req (last 2s)

Window B 12:01-12:02
100 req (first 1s)

200 requests arrive in under 3 seconds — rate limit completely bypassed at the window boundary

Sliding Window Algorithms

Log (exact) vs Counter (approximate) — production trade-off

Sliding Window Log (Exact)

Sorted set of timestamps — exact but memory-heavyREDIS

// Key: ratelimit:{userId} — Sorted Set, score = timestamp
function checkLimit(userId) {
  now     = Date.now()           // ms
  window  = 60000                // 60s in ms
  cutoff  = now - window

  redis.zremrangebyscore(`rl:${userId}`, 0, cutoff)  // remove old
  count = redis.zcard(`rl:${userId}`)               // count remaining

  if (count >= 100) return REJECT

  redis.zadd(`rl:${userId}`, now, now)             // log this request
  redis.expire(`rl:${userId}`, 120)              // TTL cleanup
  return ALLOW
}
// Memory: O(N) — stores EVERY request timestamp
// 100 req/min × 1M users = 100M Redis entries → expensive

Sliding Window Counter ★ (Recommended)

// WEIGHTED INTERPOLATION FORMULA

rate = prev_count × (1 − elapsed_fraction) + curr_count

Example: limit = 100 req/min, current time = 12:01:45 (45s into window)
elapsed_fraction = 45s / 60s = 0.75
prev_window (12:00–12:01): 80 requests
curr_window (12:01–12:02): 30 requests

estimated_rate = 80 × (1 − 0.75) + 30 = 20 + 30 = 50 → ALLOW ✓

Attack scenario: prev=100, curr=95 at T=12:01:01 (elapsed=0.017)
estimated_rate = 100 × 0.983 + 95 = 98.3 + 95 = 193 → REJECT ✓ (boundary burst caught!)

Sliding window counter — two Redis keys, O(1) memoryREDIS

function checkLimit(userId, limit, windowSec) {
  now     = Math.floor(Date.now() / 1000)
  bucket  = Math.floor(now / windowSec)
  elapsed = (now % windowSec) / windowSec

  keyCurr = `rl:${userId}:${bucket}`
  keyPrev = `rl:${userId}:${bucket - 1}`

  [prevCount, currCount] = redis.mget(keyPrev, keyCurr)
  prevCount = prevCount || 0
  currCount = currCount || 0

  rate = prevCount * (1 - elapsed) + currCount

  if (rate >= limit) return REJECT

  redis.incr(keyCurr)
  redis.expire(keyCurr, windowSec * 2)
  return ALLOW
}
// Memory: O(1) — only 2 counter keys per user at any time
// Accuracy: ~99.997% vs true sliding window (0.003% error)

Token Bucket & Leaky Bucket

When natural bursts are legitimate — or must be suppressed

Token Bucket

BURST-FRIENDLY · O(1) STATE

Bucket holds up to N tokens. Refills at R tokens/sec. Each request consumes 1 token. Empty bucket → reject. Full bucket allows N instant requests (burst).

capacity=20, rate=10/sec:
User idle 2s → bucket fills to 20
20 instant requests → all allowed (burst!)
21st → rejected
After 0.1s → 1 refilled → allow again

Leaky Bucket

CONSTANT RATE · QUEUE-BASED

Requests queue up in a bucket. Bucket "leaks" (processes) at constant rate R. If bucket full → drop. Output is always exactly R req/sec regardless of input.

Use for: payment processors,
SMS gateways, anything with
strict constant-rate downstream
✗ Adds latency (queued requests)
✗ Not for interactive APIs

Token bucket — Redis Lua for atomicityLUA

-- KEYS[1] = bucket key
-- ARGV[1] = capacity, ARGV[2] = rate/sec, ARGV[3] = now_ms

local capacity = tonumber(ARGV[1])
local rate     = tonumber(ARGV[2])
local now      = tonumber(ARGV[3])

local bucket   = redis.call("HMGET", KEYS[1], "tokens", "last")
local tokens   = tonumber(bucket[1] or capacity)
local last     = tonumber(bucket[2] or now)

local elapsed  = (now - last) / 1000   -- seconds
tokens = math.min(capacity, tokens + elapsed * rate)

if tokens >= 1 then
    tokens = tokens - 1
    redis.call("HMSET", KEYS[1], "tokens", tokens, "last", now)
    redis.call("EXPIRE", KEYS[1], 3600)
    return 1   -- ALLOW
else
    redis.call("HMSET", KEYS[1], "tokens", tokens, "last", now)
    return 0   -- REJECT
end

Algorithm Comparison

Choose by use case — not by complexity

ALGORITHM	MEMORY	BURST	ACCURACY	BEST FOR
Fixed Window	O(1)	✗ Boundary burst	~Approximate	Simple counters, non-critical endpoints
Sliding Window Log	O(N) — per request	✓ Exact	Exact	Low-traffic strict APIs, small user base
Sliding Window Counter ★	O(1)	~✓ 99.997%	~Exact	Production default for most APIs
Token Bucket	O(1)	✓ Up to capacity	Exact	APIs where short bursts are legitimate
Leaky Bucket	O(queue)	✗ Queued	Exact output	Constant-rate downstream (payments, SMS)

Interview default: "I'd use Sliding Window Counter as the default — O(1) memory, no boundary burst vulnerability, 0.003% accuracy error which is negligible for rate limiting. For APIs where genuine burst traffic should be allowed (e.g., bulk file uploads), I'd switch to Token Bucket."

Redis Lua Scripts — Why Atomic?

The race condition that breaks naive implementations

Race condition without Lua — two threads, one counterRACE CONDITION

// Without Lua — BROKEN under concurrency:
Thread A: GET tokens → 1             // sees 1 token available
Thread B: GET tokens → 1             // also sees 1 token available
Thread A: SET tokens 0                 // decrements to 0, allows request
Thread B: SET tokens 0                 // also decrements to 0, ALSO allows request
// Result: 2 requests allowed when only 1 token existed!

// With Lua script — CORRECT:
// Redis executes Lua scripts atomically
// No other Redis command can interleave between Lua lines
// Equivalent to a Redis MULTI/EXEC transaction but faster

// Execute Lua script in application code:
const script = redis.createScript(luaCode);
const result = await script.eval(
  [`bucket:${userId}`],              // KEYS[1]
  [capacity, refillRate, Date.now()]  // ARGV[1,2,3]
);

Why not MULTI/EXEC (Redis transactions)? MULTI/EXEC doesn't support conditional logic — you can't say "if tokens > 0 then decrement, else reject" inside a transaction. Lua scripts can. For rate limiting, you always need the check-and-decrement to be conditional and atomic.

Distributed Rate Limiting

Multiple API servers — shared state or local approximation?

Centralized Redis ★

All servers read/write to same Redis. Consistent hashing: hash(userId) → specific Redis shard. Correct counts always.

✓ Accurate
✓ Simple consistency
✓ Works at 500K req/sec/node

✗ Redis is bottleneck + SPOF
✗ +0.5ms per request (network)
Mitigate: replicas + circuit breaker

Local Counters

Each server keeps in-memory counter. Set limit = total_limit / N servers. Periodic gossip sync. No Redis dependency.

✓ Zero latency overhead
✓ No Redis dependency
✓ Survives Redis failure

✗ Under-counts (each server only sees own traffic)
✗ Uneven traffic distribution allows over-limit
Use: approximate limiting acceptable

API Gateway

Apply rate limiting at NGINX/Kong/AWS API GW before request hits any app server. Gateway backed by Redis.

✓ Rejects before app server compute
✓ Centralized config (no app deploy)
✓ Works across heterogeneous backends

✗ Gateway is now critical path
✗ Less flexibility per-endpoint
Best for: org-wide API rate limiting

Fail-Open vs Fail-Closed

What happens when Redis is unavailable?FAILURE MODE

function checkRateLimit(userId) {
  try {
    return redis.checkLimit(userId)
  } catch (RedisUnavailable) {
    
    // Option A: Fail-Open (allow all)
    return ALLOW
    // Pro: service stays up, users unaffected
    // Con: during Redis outage, limits not enforced → potential abuse
    // Use for: internal services, non-critical limits
    
    // Option B: Fail-Closed (reject all)
    return REJECT
    // Pro: strict — no abuse during outage
    // Con: service degraded for ALL users when Redis is down
    // Use for: financial APIs, security-critical endpoints
    
    // Option C: Local fallback (temporary local counter)
    return localFallback.checkLimit(userId)  // in-memory, less accurate
    // Best of both worlds for most production systems
  }
}

HTTP 429 Response Design

Good API clients need these headers to behave correctly

HTTP/1.1 429 Too Many Requests

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700000060 ← Unix timestamp when window resets
Retry-After: 42 ← seconds until safe to retry
Content-Type: application/json

Response body + good client behaviorJSON + CLIENT

// Response body:
{
  "error": "rate_limit_exceeded",
  "message": "100 requests per minute exceeded",
  "retry_after_seconds": 42,
  "limit": 100,
  "window": "1 minute"
}

// Good client behavior (using Retry-After):
if (response.status === 429) {
  retryAfter = response.headers.get('Retry-After')
  await sleep(retryAfter * 1000)
  return retry(request)  // not immediately — that hammers the server
}

// Include on EVERY response (not just 429):
// Allows clients to monitor their quota proactively
X-RateLimit-Limit:     100
X-RateLimit-Remaining: 73   ← still have 73 left
X-RateLimit-Reset:     1700000060

Multi-Tier Rate Limiting

Production systems apply multiple limits simultaneously

// REQUEST FLOWS THROUGH ALL TIERS — first violation wins (429)

CDN / Edge

IP Address

DDoS protection. Single IP flooding the system.

1,000 req/min per IP

API Gateway

User ID

Abuse prevention. One user monopolizing shared resources.

100 req/min per user

API Service

API Key / Tier

Quota enforcement. Free vs Pro vs Enterprise daily limits.

1K–∞ req/day per key

Endpoint

User + Endpoint

Resource protection. Expensive ops individually limited.

5/min (login), 10/hr (email)

Key design insight: Different tiers protect different concerns and are applied at different layers. IP limits live at the CDN (before hitting origin). User limits live at the gateway. Quota lives in the application. Each layer defends against a different threat model.

Boundary Burst Attack & Fix

~1 hr

›

Draw a timeline showing the boundary burst attack: limit=100 req/min, attacker sends 100 at 12:00:58 and 100 at 12:01:01. Show the counter values at each second.
Implement sliding window counter in JavaScript (no Redis — just the math logic with mocked prev/curr counts)
Apply the same attack to the sliding window counter: prev=100, curr=95, elapsed=0.017 (1 second in). Does it correctly block?
What is the worst-case over-limit that sliding window counter can allow? (hint: think about what happens at the start of a new window)

Token Bucket Implementation

~1.5 hrs

›

Implement TokenBucket(capacity, refillRatePerSec) in Java with boolean tryConsume()
Make it thread-safe (hint: synchronized block or AtomicLong + last refill timestamp)
Test: capacity=10, rate=5/sec. Assert: 10 instant calls succeed, 11th fails, wait 0.2s, 12th succeeds
How do you serialize this bucket to Redis so it survives server restarts? Write the HMSET schema and the Lua reconstruction logic.
If the server was down for 1 hour and bucket was at 0 — what should happen on the next request? (cap at capacity, not at hours × rate)

Distributed Rate Limiter

~1.5 hrs

›

20 app servers, limit=100 req/min per user. Naive: each server enforces 100/20=5 req/min locally. What goes wrong? Give a specific failure scenario.
Draw the sequence diagram for centralized Redis: Server A → Redis check → allow/reject → response
Demonstrate the race condition: Servers A and B both see count=99, both increment. Show why Lua prevents this.
Redis becomes unavailable for 60 seconds. Argue for both fail-open and fail-closed. Which do you choose and why?
Design a circuit breaker for the Redis rate limiter: when Redis is down, switch to local approximate counter automatically

★

Add Rate Limiting to URL Shortener (B5 Integration)

~2 hrs

›

Revisit your URL shortener from Module B5. Add production-grade rate limiting:

POST /shorten: 100 req/hour per IP (abuse prevention)
GET /{code}: 1,000 req/min per IP (DDoS protection)
POST /shorten: 1,000 req/day per user account (quota)

Design the Redis key schema for all three limits
Which algorithm for each? (justify: fixed window vs sliding window counter vs token bucket)
Where in the architecture is each limit applied? (CDN vs API gateway vs application code)
Write the complete HTTP 429 response for a failed POST /shorten with all required headers
A premium user should get 10,000 req/day instead of 1,000. How does the application look up the user's tier before applying the limit?

0 / 17 completedMODULE B9 · RATE LIMITER

Fixed Window: simple INCR — but boundary burst vulnerability

Sliding Window Log: exact, ZADD/ZREMRANGEBYSCORE — O(N) memory

Sliding Window Counter formula: prev×(1−elapsed) + curr

Sliding window counter: O(1) memory, 0.003% error, production default

Token Bucket: capacity (burst) + refill_rate (sustained) — HMSET in Redis

Leaky Bucket: constant output rate, queue-based, for downstream rate control

Algorithm comparison table — when to use each

Race condition: GET + SET is broken — need Lua for atomicity

Lua scripts are atomic in Redis — no interleaving possible

Distributed: centralized Redis (consistent hashing per user)

Fail-open vs fail-closed: know the trade-offs and when to choose each

HTTP 429 headers: X-RateLimit-Limit, Remaining, Reset, Retry-After

Multi-tier: IP (CDN) → user (gateway) → API key (app) → endpoint (app)

API gateway approach: NGINX / Kong / AWS API GW + Redis backend

✏️ Task 1: boundary burst attack + sliding window fix

✏️ Task 2: token bucket implementation (Java, thread-safe, Redis-serialized)

✏️ Task 4 (capstone): rate limiting added to URL shortener

// NEXT MODULE

B10 — Consistent Hashing & Service Discovery

      Virtual nodes · Minimal key remapping on node add/remove

      Load distribution · Hot spot prevention · Rendez-vous hashing

      Service registry (Consul/ZooKeeper) · Health checks · DNS-based discovery