SYSTEM DESIGN MASTERY · TRACK C · MODULE C5 · WEEK 29 OAUTH2 · JWT · MTLS · ZERO-TRUST · SECRETS · OWASP · DDOS

// TRACK C · ADVANCED TOPICS · FINAL MODULE

Security
Architecture

OAUTH2 / OIDC · JWT INTERNALS · MTLS · ZERO-TRUST
SECRETS MANAGEMENT · OWASP TOP 10 · DDOS MITIGATION

OAuth2

AUTH STANDARD

OWASP RISKS

Zero

IMPLICIT TRUST

FINAL MODULE

OAuth2 / OIDC

JWT

mTLS

Zero-Trust

Secrets Mgmt

OWASP Top 10

DDoS / Rate Limit

Authentication vs Authorization

Two separate checks — conflating them is the root cause of access control bugs

Authentication

WHO ARE YOU? PROVE YOUR IDENTITY.

Validates that the caller is who they claim to be. Happens once per session or token issuance.

→ Passwords + MFA
→ OAuth2 tokens
→ mTLS certificates
→ "The user is logged in"

Authorization

WHAT CAN YOU DO? CHECK PERMISSIONS.

Validates that the authenticated caller has permission for this specific action on this specific resource. Happens on every request.

→ RBAC / ABAC / ACLs
→ OAuth2 scopes
→ "User can delete order #123"
→ Must check on EVERY endpoint

Common mistake: checking authentication but not authorization. A user who is logged in can call any endpoint — even ones that belong to other users. Always check: (1) is this request authenticated? (2) does this principal have permission for THIS specific resource? Missing step 2 = IDOR (Insecure Direct Object Reference) = OWASP #1.

OAuth2 & OpenID Connect

OAuth2 delegates authorization — OIDC adds identity on top

// AUTHORIZATION CODE + PKCE FLOW — the correct flow for web & mobile

User clicks "Login with Google." App generates code_verifier (random 43–128 chars) and code_challenge = BASE64URL(SHA256(code_verifier)).

App redirects browser to Google: /authorize?client_id=...&redirect_uri=...&scope=openid+email&state=xyz&code_challenge=...&code_challenge_method=S256

User authenticates at Google (password + MFA). User consents to requested scopes. Google redirects back to app's redirect_uri with a short-lived authorization_code and the original state.

App verifies state matches (CSRF protection). App POSTs to /token: { code, code_verifier, client_id, redirect_uri, grant_type=authorization_code }

Google verifies code_verifier matches code_challenge. Returns: access_token (15min), refresh_token (14 days), id_token (JWT with user claims).

App validates id_token signature + claims (iss, aud, exp, nonce). Stores refresh_token in httpOnly cookie. Keeps access_token in memory only (not localStorage).

App calls APIs with Authorization: Bearer <access_token>. API verifies token signature using Google's public JWKS keys — no round-trip to Google needed.

Access token expires. App uses refresh_token to silently get a new access token via POST /token with grant_type=refresh_token. User never sees this.

Why PKCE? On mobile, the OS can intercept redirects from any app. Without PKCE, a malicious app could steal the authorization code from the redirect and exchange it for tokens. PKCE binds the code to a verifier that only the legitimate app knows — stolen codes are useless without code_verifier.

JWT Deep Dive

Base64URL encoded, NOT encrypted — never put secrets in the payload

Header

ALGORITHM + KEY ID

{
"alg": "RS256",
"typ": "JWT",
"kid": "key-id-1"
}

kid → selects which public key
to use from JWKS endpoint

Payload

CLAIMS — VERIFY ALL

{
"sub": "user_123",
"iss": "auth.co.com",
"aud": "api.co.com",
"exp": 1700000000,
"jti": "uuid-abc",
"roles": ["admin"]
}

Signature

RSA-SHA256 SIGNED

RSA_SHA256(
base64url(header)
+ "."
+ base64url(payload),
private_key
)

Verify with public key.
Only auth server can mint.

alg:none Attack

Attacker changes header to "alg":"none", strips the signature. Server accepts if it doesn't enforce algorithm whitelist.

Fix: whitelist allowed algorithms.
Reject any token with alg=none.
Never trust the header's alg claim blindly.

HS256 Key Confusion

If server accepts both RS256 and HS256, attacker signs a token using HS256 with the server's public key as the HMAC secret. Server verifies it using its own public key.

Fix: validate algorithm strictly.
Never allow algorithm negotiation.
Use RS256 only in distributed systems.

Missing exp Validation

If expiry is not checked, tokens work forever. A token issued 2 years ago for a deleted account still grants access.

Fix: always check exp > now().
Allow 5-min clock skew tolerance.
Keep access tokens short (15 min).

Sensitive Data in Payload

JWT payload is Base64URL encoded — it is NOT encrypted. Anyone who intercepts the token can read the payload. Passwords, SSNs, PII stored there are exposed.

Fix: put only non-sensitive claims.
Use JWE (encrypted JWT) if confidentiality needed.
Never put passwords, PII, or secrets.

RS256 vs HS256 — why RS256 wins in distributed systemsALGORITHM CHOICE

// HS256 (HMAC-SHA256) — SYMMETRIC
// Same key used to sign AND verify.
// Problem: every service that verifies must have the secret key.
// If any service is compromised → attacker can forge any token.
token = jwt.sign(payload, "shared-secret-key", { algorithm: "HS256" })
jwt.verify(token, "shared-secret-key")   ← every service needs this key

// RS256 (RSA-SHA256) — ASYMMETRIC ← PREFERRED
// Auth server signs with PRIVATE key (kept secret, only auth server has it).
// All services verify with PUBLIC key (published at JWKS endpoint).
// Compromise of any service → cannot forge tokens (no private key).
token = jwt.sign(payload, private_key, { algorithm: "RS256", keyid: "key-1" })
public_key = fetch("https://auth.co.com/.well-known/jwks.json")
jwt.verify(token, public_key, {
  algorithms: ["RS256"],     ← whitelist ONLY RS256
  audience: "api.co.com",   ← verify aud claim
  issuer: "auth.co.com"    ← verify iss claim
})

mTLS — Mutual TLS

Service-to-service authentication — both sides prove identity via certificates

mTLS handshake — what happens under the hoodTLS HANDSHAKE

// Regular TLS: only CLIENT verifies server certificate.
// mTLS: BOTH sides verify each other. Used for service-to-service auth.

Client (order-service)            Server (payment-service)
    |                                    |
    |--- ClientHello ------------------>|
    |<-- ServerHello + ServerCert ------|  ← server sends its cert
    |    (CN=payment-service,           |
    |     issued by internal-CA)        |
    |                                   |
    |--- ClientCert ------------------->|  ← client sends its cert
    |    (CN=order-service,             |
    |     issued by internal-CA)        |
    |                                   |
    // Both sides verify:
    // 1. Certificate signed by trusted internal CA?
    // 2. Certificate not expired?
    // 3. Certificate not in CRL (revoked)?
    // 4. Subject (CN) matches expected service name?
    |                                   |
    |=== Encrypted channel established ==|
    |--- GET /charge (HTTP/1.1) ------->|  ← now authorized

// Service mesh (Istio) automates all of this:
// Envoy sidecar handles mTLS transparently.
// Policy: "order-service → payment-service: ALLOW"
//         "frontend → payment-service: DENY"
// Your application code calls http://payment-service/charge
// Envoy sidecar upgrades to mTLS automatically.

// Cert rotation (SPIRE):
// SPIRE issues short-lived SVIDs (24–72 hrs) to every workload.
// Automatic rotation before expiry — zero manual cert management.

Why mTLS over API keys for service-to-service? API keys are static strings — once compromised they're valid until manually rotated. mTLS certificates are short-lived (24–72 hours), automatically rotated, cryptographically bound to a specific workload, and revocable via CRL. A compromised cert is useless after its short TTL. A compromised API key may go undetected for months.

Zero-Trust Architecture

Never trust, always verify — treat every request as if the attacker is already inside

🪪

Identity

MFA for users. mTLS + SPIFFE for services. Every principal verified.

💻

Device

Posture checks: patched, encrypted, MDM-enrolled before access.

🌐

Network

Microsegmentation. Explicit allow rules. No flat network trust.

⚙️

Application

RBAC at app layer. Fine-grained AuthZ. Scoped tokens per action.

🔐

Data

Encrypted at rest + in transit. Classification. Per-user KMS keys.

Microsegmentation — explicit allow vs flat networkK8S NETWORK POLICY

// Flat network (DEFAULT, DANGEROUS):
// Any pod can call any pod. Compromised frontend → calls payment DB directly.

// Microsegmented (ZERO-TRUST):
// All traffic denied by default. Explicit allows only.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: payment-service-policy
spec:
  podSelector:
    matchLabels: { app: payment-service }
  policyTypes: [Ingress, Egress]
  ingress:
  - from:
    - podSelector:
        matchLabels: { app: order-service }  ← ONLY order-service may call payment
    ports: [{ port: 8080 }]
  egress:
  - to:
    - podSelector:
        matchLabels: { app: payment-db }     ← ONLY payment-db may be called

// Blast radius comparison:
// Without Zero-Trust: compromised transcoding pod → access to payment DB, user DB, secrets
// With Zero-Trust: compromised transcoding pod → can only reach video-storage S3 bucket
//                  (its only explicit allow). Attacker is contained.

Secrets Management

Static long-lived secrets are a liability — dynamic short-lived credentials are the goal

✗

Hardcoded in source code — git history is permanent. Even if you delete the file, the secret is in every clone, fork, and CI log. Secret scanners (GitGuardian, truffleHog) find these instantly.

✗

In environment variables in Dockerfile / k8s YAML — anyone who can read the pod spec can read the secret. These files get committed, shared, and logged.

✗

Never rotated — "it's been working for 3 years." Long-lived static secrets accumulate risk. Rotate all secrets at fixed intervals and immediately on any suspected breach.

✗

Same secret across environments — a dev breach should never compromise production. Separate secrets per environment, separate KMS keys, separate rotation schedules.

HashiCorp Vault dynamic secrets — no long-lived credentialsVAULT

// STATIC secret (dangerous): long-lived DB password stored as a secret
// If the secret leaks: valid forever until manually rotated.

// DYNAMIC secret (Vault): Vault generates credentials per-request, with TTL

// App startup: request DB credentials from Vault
response = vault.read("database/creds/my-role")
// Vault generates: { username: "v-app-20250307-abc", password: "xyz", lease_ttl: "1h" }
// Vault creates this user in the DB with read permissions
// After 1 hour: Vault AUTOMATICALLY revokes the DB user

db.connect(
  host="db.internal",
  user=response.username,     ← temporary, unique to this app instance
  password=response.password  ← expires in 1 hour
)

// App renews lease before expiry:
vault.renew(response.lease_id)  ← extend by another hour

// Benefits:
// • No long-lived credentials → breach window is at most 1 hour
// • Full audit log: who requested what credential, when
// • Automatic cleanup: no orphaned credentials accumulate
// • Unique per instance: compromise of one pod ≠ compromise of all pods

OWASP Top 10 (2021)

Know these cold — they appear in every security-conscious system design interview

BROKEN ACCESS CONTROL

IDOR: user accesses another user's data by changing an ID in the URL. Vertical escalation: user calls admin endpoints. Most common vuln in modern apps.

Fix: check authorization on every request. Never trust client-supplied resource IDs without verifying ownership.

CRYPTOGRAPHIC FAILURES

Passwords stored in plaintext or with MD5/SHA1 (broken). HTTP for sensitive data. Weak random number generation for tokens.

Fix: bcrypt/Argon2 for passwords. AES-256 for data at rest. TLS 1.2+ everywhere. CSPRNG for tokens.

INJECTION (SQL, NoSQL, OS Commands)

User input concatenated into queries. Input "1 OR 1=1 --" returns all users. Can lead to full DB dump, data deletion, or OS command execution.

Fix: parameterized queries / prepared statements. NEVER concatenate user input into SQL strings.

SECURITY MISCONFIGURATION

Default credentials, public S3 buckets, verbose error messages with stack traces, debug endpoints in production, unnecessary ports open.

Fix: IaC security scanning (tfsec, checkov). Hardened default configs. Regular misconfiguration audits.

VULNERABLE COMPONENTS

Log4Shell was in log4j — a transitive dependency. Attackers scan for known CVEs in common libraries. Your app is only as secure as its weakest dependency.

Fix: Snyk, Dependabot, OWASP Dependency-Check in CI. Auto-PRs for security patches. SBOM (Software Bill of Materials).

SERVER-SIDE REQUEST FORGERY (SSRF)

App fetches a URL from user input. Attacker supplies http://169.254.169.254/latest/meta-data/ → AWS metadata endpoint → IAM credentials → full cloud account access.

Fix: whitelist allowed domains. Block internal IP ranges (169.254.x.x, 10.x.x.x). Use AWS IMDSv2 (requires session token, blocks simple SSRF).

Rate Limiting for Security & DDoS Mitigation

Rate limiting prevents abuse — DDoS mitigation absorbs volumetric attacks

Security-focused rate limiting — brute force and credential stuffing preventionREDIS

// BRUTE FORCE on login endpoint:
// Without protection: attacker tries 10M passwords/second.
def check_login_rate_limit(ip, user_id):
    # Per-IP: 5 attempts per 15 minutes (stops distributed single-user attack)
    ip_key = f"login:ip:{ip}"
    ip_count = redis.incr(ip_key)
    if ip_count == 1: redis.expire(ip_key, 900)   # 15 min window
    if ip_count > 5: raise TooManyRequests

    # Per-user: 10 attempts per hour (stops distributed multi-IP attack)
    user_key = f"login:user:{user_id}"
    user_count = redis.incr(user_key)
    if user_count == 1: redis.expire(user_key, 3600)  # 1 hour window
    if user_count > 10:
        send_suspicious_activity_alert(user_id)
        raise TooManyRequests

// CREDENTIAL STUFFING (breached credentials list from other sites):
// Attacker uses many different IPs → per-IP limits ineffective.
// Fix: rate limit per user_id (not just IP). CAPTCHA after 3 failures.
// HIBP (Have I Been Pwned) check: reject passwords in known breach datasets.
// Device fingerprinting: flag new device + failed login → MFA challenge.

// DDoS mitigation layers:
// L3/4 volumetric (millions of packets): ISP BGP blackholing, AWS Shield Standard
// L7 application (HTTP flood): Cloudflare WAF, AWS WAF, rate limiting, CAPTCHA
// Anycast: Cloudflare has 200+ PoPs — attack absorbed at edge, never reaches origin

L3/L4 VOLUMETRIC

Millions of packets/sec. Saturates bandwidth. BGP blackholing at ISP level. AWS Shield Standard (free) handles basic L3/4.

Tools: AWS Shield,
Cloudflare Magic Transit,
Arbor/NETSCOUT

L7 APPLICATION

HTTP flood, slowloris, GET flood. WAF blocks known attack patterns. Rate limiting + CAPTCHA + bot detection.

Tools: Cloudflare WAF,
AWS WAF, Akamai,
rate limiting

ANYCAST ABSORPTION

Cloudflare's 200+ PoPs share the same IP via anycast. Attack traffic is routed to nearest PoP and absorbed — never reaching your origin server.

Tools: Cloudflare,
Akamai, Fastly,
AWS CloudFront

OAuth2 Flow Design — Login with Google

~1.5 hrs

›

Draw the full Authorization Code + PKCE flow, step by step (8 steps). Label every HTTP request and response.
Where do you store the access token and refresh token in the browser? Why not localStorage? What attack does httpOnly cookie prevent?
The access token expires after 15 minutes. Walk through the silent refresh flow — what happens without the user noticing?
User clicks "Logout." What must you invalidate on the client side and on the server side? What happens if you only clear the cookie?
Your API server needs to call Google Drive on behalf of the user. How does the token flow differ from a user logging in? What scope do you request?

JWT Security Review — Find the Vulnerabilities

~1 hr

›

Review this code and find all security issues:

token = jwt.decode(request.headers["Authorization"],
                  algorithms=["HS256", "RS256", "none"])
if token["user_id"] == requested_user_id:
    return data

Identify every vulnerability (at least 4). Explain why each is dangerous.
Write the corrected implementation with all required validations.
A user's account is compromised at 2pm. Their JWT expires at 6pm. How do you invalidate it immediately? Give two approaches and their trade-offs.
Should the JWT payload contain each of these? Justify: user's email, user's role, user's SSN, user's account balance.

Zero-Trust for YouTube Microservices (B8)

~1.5 hrs

›

Map all service-to-service calls in the YouTube system. Which currently use shared long-lived credentials?
Design the mTLS policy matrix: which services are allowed to call which? Express as explicit allow rules.
The transcoding service needs read/write access to S3. Write the least-privilege IAM policy — which specific S3 actions on which specific bucket prefix?
A transcoding pod is compromised. With Zero-Trust microsegmentation in place: what can the attacker reach? Without it: what can they reach?
Design the secret rotation procedure for the S3 credentials: trigger, new version creation, zero-downtime migration, old version revocation.

★

Secure a Fintech Payments API (Stripe-like)

~3 hrs

›

Design complete security architecture for a payments API used by merchants to charge customers.

Authentication: how do merchant API keys work mechanically (creation, hashing, lookup)? How does OAuth2 work for user-authorized payments (like Stripe Connect)?
Authorization RBAC: a merchant can only access their own charges, customers, and refunds. Design the RBAC model — what roles, what resources, what permissions?
Storing API keys: should you hash them (like passwords) or encrypt them? What's the difference? What does Stripe actually do (sk_live_ keys)?
Top 3 OWASP risks for a payments API and their specific mitigations.
Rate limiting design: limits for unauthenticated endpoints (API key lookup), authenticated API calls (general), and the charge endpoint specifically. Include the Redis data structure.
Audit log schema: what events must be logged (at minimum)? What fields per event? What retention policy for PCI-DSS compliance?

0 / 24 completedMODULE C5 · SECURITY ARCHITECTURE

AuthN vs AuthZ — identity vs permissions, separate checks on every request

OAuth2 four grant types: auth code+PKCE, client credentials, implicit (deprecated), ROPC (deprecated)

OAuth2 four roles: resource owner, client, auth server, resource server

PKCE: code_verifier + code_challenge — prevents auth code interception

OIDC = OAuth2 + identity, ID token is a JWT with user claims

OIDC verification: signature, iss, aud, exp, iat, nonce — all six checks

JWT: header.payload.signature — Base64URL encoded, NOT encrypted

RS256 (asymmetric, preferred) vs HS256 (symmetric, avoid in distributed)

JWT vulnerabilities: alg:none, HS256 key confusion, missing exp, sensitive data

JWT revocation: short expiry + refresh token OR jti blocklist in Redis

mTLS: both sides verify certificates — used for service-to-service auth

SPIFFE/SPIRE: workload identity — short-lived SVIDs, auto-rotated

Zero-Trust: never trust, always verify — identity-based not network-based

Three Zero-Trust principles: verify explicitly, least privilege, assume breach

Microsegmentation: explicit allow rules — compromised pod is contained

Secrets anti-patterns: hardcoded, in Dockerfiles, never rotated, shared across envs

Vault dynamic secrets: TTL-scoped per-request credentials, audit log

OWASP #1 Broken Access Control: IDOR, vertical privilege escalation

OWASP #3 Injection: parameterized queries always — never concatenate user input

OWASP #10 SSRF: block internal IPs, whitelist allowed domains, IMDSv2

Rate limiting for security: per-IP + per-user for brute force prevention

Credential stuffing: rate limit by user_id + CAPTCHA + HIBP check

DDoS: L3/4 Shield, L7 WAF + rate limit, Anycast absorption at edge

✏️ Tasks 1–4 completed (OAuth2, JWT review, Zero-Trust, Stripe security)

🎓 COURSE COMPLETE

System Design Mastery — All Three Tracks

TRACK A · LOW-LEVEL DESIGN · 6 MODULES (A1–A6)
TRACK B · HIGH-LEVEL DESIGN · 12 MODULES (B1–B12)
TRACK C · ADVANCED TOPICS · 5 MODULES (C1–C5)

TOPICS MASTERED: OOP · SOLID · DESIGN PATTERNS · DATABASES · CACHING
MESSAGE QUEUES · URL SHORTENER · TWITTER · WHATSAPP · YOUTUBE
RATE LIMITER · CONSISTENT HASHING · ACID · SAGA · INTERVIEW FRAMEWORK
CONSENSUS / RAFT · GEO-DISTRIBUTION · CRDTS · ML SYSTEMS
OBSERVABILITY · SRE · OAUTH2 · JWT · ZERO-TRUST · SECURITY

RECOMMENDED NEXT STEPS:
MOCK INTERVIEWS · LEETCODE SYSTEM DESIGN · PRAMP · INTERVIEWING.IO

← C4 Observability 📄 Study Notes ↑ Roadmap ✅ Track Complete