Authentication vs Authorization
Two separate checks — conflating them is the root cause of access control bugs
Authentication
WHO ARE YOU? PROVE YOUR IDENTITY.
Validates that the caller is who they claim to be. Happens once per session or token issuance.
→ Passwords + MFA
→ OAuth2 tokens
→ mTLS certificates
→ "The user is logged in"
→ OAuth2 tokens
→ mTLS certificates
→ "The user is logged in"
Authorization
WHAT CAN YOU DO? CHECK PERMISSIONS.
Validates that the authenticated caller has permission for this specific action on this specific resource. Happens on every request.
→ RBAC / ABAC / ACLs
→ OAuth2 scopes
→ "User can delete order #123"
→ Must check on EVERY endpoint
→ OAuth2 scopes
→ "User can delete order #123"
→ Must check on EVERY endpoint
Common mistake: checking authentication but not authorization. A user who is logged in can call any endpoint — even ones that belong to other users. Always check: (1) is this request authenticated? (2) does this principal have permission for THIS specific resource? Missing step 2 = IDOR (Insecure Direct Object Reference) = OWASP #1.
OAuth2 & OpenID Connect
OAuth2 delegates authorization — OIDC adds identity on top
// AUTHORIZATION CODE + PKCE FLOW — the correct flow for web & mobile
1
User clicks "Login with Google." App generates code_verifier (random 43–128 chars) and code_challenge = BASE64URL(SHA256(code_verifier)).
2
App redirects browser to Google: /authorize?client_id=...&redirect_uri=...&scope=openid+email&state=xyz&code_challenge=...&code_challenge_method=S256
3
User authenticates at Google (password + MFA). User consents to requested scopes. Google redirects back to app's redirect_uri with a short-lived authorization_code and the original state.
4
App verifies state matches (CSRF protection). App POSTs to /token: { code, code_verifier, client_id, redirect_uri, grant_type=authorization_code }
5
Google verifies code_verifier matches code_challenge. Returns: access_token (15min), refresh_token (14 days), id_token (JWT with user claims).
6
App validates id_token signature + claims (iss, aud, exp, nonce). Stores refresh_token in httpOnly cookie. Keeps access_token in memory only (not localStorage).
7
App calls APIs with Authorization: Bearer <access_token>. API verifies token signature using Google's public JWKS keys — no round-trip to Google needed.
8
Access token expires. App uses refresh_token to silently get a new access token via POST /token with grant_type=refresh_token. User never sees this.
Why PKCE? On mobile, the OS can intercept redirects from any app. Without PKCE, a malicious app could steal the authorization code from the redirect and exchange it for tokens. PKCE binds the code to a verifier that only the legitimate app knows — stolen codes are useless without code_verifier.
JWT Deep Dive
Base64URL encoded, NOT encrypted — never put secrets in the payload
Header
ALGORITHM + KEY ID
{
"alg": "RS256",
"typ": "JWT",
"kid": "key-id-1"
}
kid → selects which public key
to use from JWKS endpoint
"alg": "RS256",
"typ": "JWT",
"kid": "key-id-1"
}
kid → selects which public key
to use from JWKS endpoint
Payload
CLAIMS — VERIFY ALL
{
"sub": "user_123",
"iss": "auth.co.com",
"aud": "api.co.com",
"exp": 1700000000,
"jti": "uuid-abc",
"roles": ["admin"]
}
"sub": "user_123",
"iss": "auth.co.com",
"aud": "api.co.com",
"exp": 1700000000,
"jti": "uuid-abc",
"roles": ["admin"]
}
Signature
RSA-SHA256 SIGNED
RSA_SHA256(
base64url(header)
+ "."
+ base64url(payload),
private_key
)
Verify with public key.
Only auth server can mint.
base64url(header)
+ "."
+ base64url(payload),
private_key
)
Verify with public key.
Only auth server can mint.
alg:none Attack
Attacker changes header to
"alg":"none", strips the signature. Server accepts if it doesn't enforce algorithm whitelist.Fix: whitelist allowed algorithms.
Reject any token with alg=none.
Never trust the header's alg claim blindly.
Reject any token with alg=none.
Never trust the header's alg claim blindly.
HS256 Key Confusion
If server accepts both RS256 and HS256, attacker signs a token using HS256 with the server's public key as the HMAC secret. Server verifies it using its own public key.
Fix: validate algorithm strictly.
Never allow algorithm negotiation.
Use RS256 only in distributed systems.
Never allow algorithm negotiation.
Use RS256 only in distributed systems.
Missing exp Validation
If expiry is not checked, tokens work forever. A token issued 2 years ago for a deleted account still grants access.
Fix: always check exp > now().
Allow 5-min clock skew tolerance.
Keep access tokens short (15 min).
Allow 5-min clock skew tolerance.
Keep access tokens short (15 min).
Sensitive Data in Payload
JWT payload is Base64URL encoded — it is NOT encrypted. Anyone who intercepts the token can read the payload. Passwords, SSNs, PII stored there are exposed.
Fix: put only non-sensitive claims.
Use JWE (encrypted JWT) if confidentiality needed.
Never put passwords, PII, or secrets.
Use JWE (encrypted JWT) if confidentiality needed.
Never put passwords, PII, or secrets.
RS256 vs HS256 — why RS256 wins in distributed systemsALGORITHM CHOICE
// HS256 (HMAC-SHA256) — SYMMETRIC // Same key used to sign AND verify. // Problem: every service that verifies must have the secret key. // If any service is compromised → attacker can forge any token. token = jwt.sign(payload, "shared-secret-key", { algorithm: "HS256" }) jwt.verify(token, "shared-secret-key") ← every service needs this key // RS256 (RSA-SHA256) — ASYMMETRIC ← PREFERRED // Auth server signs with PRIVATE key (kept secret, only auth server has it). // All services verify with PUBLIC key (published at JWKS endpoint). // Compromise of any service → cannot forge tokens (no private key). token = jwt.sign(payload, private_key, { algorithm: "RS256", keyid: "key-1" }) public_key = fetch("https://auth.co.com/.well-known/jwks.json") jwt.verify(token, public_key, { algorithms: ["RS256"], ← whitelist ONLY RS256 audience: "api.co.com", ← verify aud claim issuer: "auth.co.com" ← verify iss claim })
mTLS — Mutual TLS
Service-to-service authentication — both sides prove identity via certificates
mTLS handshake — what happens under the hoodTLS HANDSHAKE
// Regular TLS: only CLIENT verifies server certificate. // mTLS: BOTH sides verify each other. Used for service-to-service auth. Client (order-service) Server (payment-service) | | |--- ClientHello ------------------>| |<-- ServerHello + ServerCert ------| ← server sends its cert | (CN=payment-service, | | issued by internal-CA) | | | |--- ClientCert ------------------->| ← client sends its cert | (CN=order-service, | | issued by internal-CA) | | | // Both sides verify: // 1. Certificate signed by trusted internal CA? // 2. Certificate not expired? // 3. Certificate not in CRL (revoked)? // 4. Subject (CN) matches expected service name? | | |=== Encrypted channel established ==| |--- GET /charge (HTTP/1.1) ------->| ← now authorized // Service mesh (Istio) automates all of this: // Envoy sidecar handles mTLS transparently. // Policy: "order-service → payment-service: ALLOW" // "frontend → payment-service: DENY" // Your application code calls http://payment-service/charge // Envoy sidecar upgrades to mTLS automatically. // Cert rotation (SPIRE): // SPIRE issues short-lived SVIDs (24–72 hrs) to every workload. // Automatic rotation before expiry — zero manual cert management.
Why mTLS over API keys for service-to-service? API keys are static strings — once compromised they're valid until manually rotated. mTLS certificates are short-lived (24–72 hours), automatically rotated, cryptographically bound to a specific workload, and revocable via CRL. A compromised cert is useless after its short TTL. A compromised API key may go undetected for months.
Zero-Trust Architecture
Never trust, always verify — treat every request as if the attacker is already inside
Identity
MFA for users. mTLS + SPIFFE for services. Every principal verified.
Device
Posture checks: patched, encrypted, MDM-enrolled before access.
Network
Microsegmentation. Explicit allow rules. No flat network trust.
Application
RBAC at app layer. Fine-grained AuthZ. Scoped tokens per action.
Data
Encrypted at rest + in transit. Classification. Per-user KMS keys.
Microsegmentation — explicit allow vs flat networkK8S NETWORK POLICY
// Flat network (DEFAULT, DANGEROUS): // Any pod can call any pod. Compromised frontend → calls payment DB directly. // Microsegmented (ZERO-TRUST): // All traffic denied by default. Explicit allows only. apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: payment-service-policy spec: podSelector: matchLabels: { app: payment-service } policyTypes: [Ingress, Egress] ingress: - from: - podSelector: matchLabels: { app: order-service } ← ONLY order-service may call payment ports: [{ port: 8080 }] egress: - to: - podSelector: matchLabels: { app: payment-db } ← ONLY payment-db may be called // Blast radius comparison: // Without Zero-Trust: compromised transcoding pod → access to payment DB, user DB, secrets // With Zero-Trust: compromised transcoding pod → can only reach video-storage S3 bucket // (its only explicit allow). Attacker is contained.
Secrets Management
Static long-lived secrets are a liability — dynamic short-lived credentials are the goal
Hardcoded in source code — git history is permanent. Even if you delete the file, the secret is in every clone, fork, and CI log. Secret scanners (GitGuardian, truffleHog) find these instantly.
In environment variables in Dockerfile / k8s YAML — anyone who can read the pod spec can read the secret. These files get committed, shared, and logged.
Never rotated — "it's been working for 3 years." Long-lived static secrets accumulate risk. Rotate all secrets at fixed intervals and immediately on any suspected breach.
Same secret across environments — a dev breach should never compromise production. Separate secrets per environment, separate KMS keys, separate rotation schedules.
HashiCorp Vault dynamic secrets — no long-lived credentialsVAULT
// STATIC secret (dangerous): long-lived DB password stored as a secret // If the secret leaks: valid forever until manually rotated. // DYNAMIC secret (Vault): Vault generates credentials per-request, with TTL // App startup: request DB credentials from Vault response = vault.read("database/creds/my-role") // Vault generates: { username: "v-app-20250307-abc", password: "xyz", lease_ttl: "1h" } // Vault creates this user in the DB with read permissions // After 1 hour: Vault AUTOMATICALLY revokes the DB user db.connect( host="db.internal", user=response.username, ← temporary, unique to this app instance password=response.password ← expires in 1 hour ) // App renews lease before expiry: vault.renew(response.lease_id) ← extend by another hour // Benefits: // • No long-lived credentials → breach window is at most 1 hour // • Full audit log: who requested what credential, when // • Automatic cleanup: no orphaned credentials accumulate // • Unique per instance: compromise of one pod ≠ compromise of all pods
OWASP Top 10 (2021)
Know these cold — they appear in every security-conscious system design interview
1
BROKEN ACCESS CONTROL
IDOR: user accesses another user's data by changing an ID in the URL. Vertical escalation: user calls admin endpoints. Most common vuln in modern apps.
Fix: check authorization on every request. Never trust client-supplied resource IDs without verifying ownership.
2
CRYPTOGRAPHIC FAILURES
Passwords stored in plaintext or with MD5/SHA1 (broken). HTTP for sensitive data. Weak random number generation for tokens.
Fix: bcrypt/Argon2 for passwords. AES-256 for data at rest. TLS 1.2+ everywhere. CSPRNG for tokens.
3
INJECTION (SQL, NoSQL, OS Commands)
User input concatenated into queries. Input "1 OR 1=1 --" returns all users. Can lead to full DB dump, data deletion, or OS command execution.
Fix: parameterized queries / prepared statements. NEVER concatenate user input into SQL strings.
5
SECURITY MISCONFIGURATION
Default credentials, public S3 buckets, verbose error messages with stack traces, debug endpoints in production, unnecessary ports open.
Fix: IaC security scanning (tfsec, checkov). Hardened default configs. Regular misconfiguration audits.
6
VULNERABLE COMPONENTS
Log4Shell was in log4j — a transitive dependency. Attackers scan for known CVEs in common libraries. Your app is only as secure as its weakest dependency.
Fix: Snyk, Dependabot, OWASP Dependency-Check in CI. Auto-PRs for security patches. SBOM (Software Bill of Materials).
10
SERVER-SIDE REQUEST FORGERY (SSRF)
App fetches a URL from user input. Attacker supplies http://169.254.169.254/latest/meta-data/ → AWS metadata endpoint → IAM credentials → full cloud account access.
Fix: whitelist allowed domains. Block internal IP ranges (169.254.x.x, 10.x.x.x). Use AWS IMDSv2 (requires session token, blocks simple SSRF).
Rate Limiting for Security & DDoS Mitigation
Rate limiting prevents abuse — DDoS mitigation absorbs volumetric attacks
Security-focused rate limiting — brute force and credential stuffing preventionREDIS
// BRUTE FORCE on login endpoint: // Without protection: attacker tries 10M passwords/second. def check_login_rate_limit(ip, user_id): # Per-IP: 5 attempts per 15 minutes (stops distributed single-user attack) ip_key = f"login:ip:{ip}" ip_count = redis.incr(ip_key) if ip_count == 1: redis.expire(ip_key, 900) # 15 min window if ip_count > 5: raise TooManyRequests # Per-user: 10 attempts per hour (stops distributed multi-IP attack) user_key = f"login:user:{user_id}" user_count = redis.incr(user_key) if user_count == 1: redis.expire(user_key, 3600) # 1 hour window if user_count > 10: send_suspicious_activity_alert(user_id) raise TooManyRequests // CREDENTIAL STUFFING (breached credentials list from other sites): // Attacker uses many different IPs → per-IP limits ineffective. // Fix: rate limit per user_id (not just IP). CAPTCHA after 3 failures. // HIBP (Have I Been Pwned) check: reject passwords in known breach datasets. // Device fingerprinting: flag new device + failed login → MFA challenge. // DDoS mitigation layers: // L3/4 volumetric (millions of packets): ISP BGP blackholing, AWS Shield Standard // L7 application (HTTP flood): Cloudflare WAF, AWS WAF, rate limiting, CAPTCHA // Anycast: Cloudflare has 200+ PoPs — attack absorbed at edge, never reaches origin
L3/L4 VOLUMETRIC
Millions of packets/sec. Saturates bandwidth. BGP blackholing at ISP level. AWS Shield Standard (free) handles basic L3/4.
Tools: AWS Shield,
Cloudflare Magic Transit,
Arbor/NETSCOUT
Cloudflare Magic Transit,
Arbor/NETSCOUT
L7 APPLICATION
HTTP flood, slowloris, GET flood. WAF blocks known attack patterns. Rate limiting + CAPTCHA + bot detection.
Tools: Cloudflare WAF,
AWS WAF, Akamai,
rate limiting
AWS WAF, Akamai,
rate limiting
ANYCAST ABSORPTION
Cloudflare's 200+ PoPs share the same IP via anycast. Attack traffic is routed to nearest PoP and absorbed — never reaching your origin server.
Tools: Cloudflare,
Akamai, Fastly,
AWS CloudFront
Akamai, Fastly,
AWS CloudFront
1
OAuth2 Flow Design — Login with Google
›
- Draw the full Authorization Code + PKCE flow, step by step (8 steps). Label every HTTP request and response.
- Where do you store the access token and refresh token in the browser? Why not
localStorage? What attack doeshttpOnlycookie prevent? - The access token expires after 15 minutes. Walk through the silent refresh flow — what happens without the user noticing?
- User clicks "Logout." What must you invalidate on the client side and on the server side? What happens if you only clear the cookie?
- Your API server needs to call Google Drive on behalf of the user. How does the token flow differ from a user logging in? What scope do you request?
2
JWT Security Review — Find the Vulnerabilities
›
Review this code and find all security issues:
token = jwt.decode(request.headers["Authorization"],
algorithms=["HS256", "RS256", "none"])
if token["user_id"] == requested_user_id:
return data
- Identify every vulnerability (at least 4). Explain why each is dangerous.
- Write the corrected implementation with all required validations.
- A user's account is compromised at 2pm. Their JWT expires at 6pm. How do you invalidate it immediately? Give two approaches and their trade-offs.
- Should the JWT payload contain each of these? Justify: user's email, user's role, user's SSN, user's account balance.
3
Zero-Trust for YouTube Microservices (B8)
›
- Map all service-to-service calls in the YouTube system. Which currently use shared long-lived credentials?
- Design the mTLS policy matrix: which services are allowed to call which? Express as explicit allow rules.
- The transcoding service needs read/write access to S3. Write the least-privilege IAM policy — which specific S3 actions on which specific bucket prefix?
- A transcoding pod is compromised. With Zero-Trust microsegmentation in place: what can the attacker reach? Without it: what can they reach?
- Design the secret rotation procedure for the S3 credentials: trigger, new version creation, zero-downtime migration, old version revocation.
★
Secure a Fintech Payments API (Stripe-like)
›
Design complete security architecture for a payments API used by merchants to charge customers.
- Authentication: how do merchant API keys work mechanically (creation, hashing, lookup)? How does OAuth2 work for user-authorized payments (like Stripe Connect)?
- Authorization RBAC: a merchant can only access their own charges, customers, and refunds. Design the RBAC model — what roles, what resources, what permissions?
- Storing API keys: should you hash them (like passwords) or encrypt them? What's the difference? What does Stripe actually do (sk_live_ keys)?
- Top 3 OWASP risks for a payments API and their specific mitigations.
- Rate limiting design: limits for unauthenticated endpoints (API key lookup), authenticated API calls (general), and the charge endpoint specifically. Include the Redis data structure.
- Audit log schema: what events must be logged (at minimum)? What fields per event? What retention policy for PCI-DSS compliance?
0 / 24 completedMODULE C5 · SECURITY ARCHITECTURE
AuthN vs AuthZ — identity vs permissions, separate checks on every request
OAuth2 four grant types: auth code+PKCE, client credentials, implicit (deprecated), ROPC (deprecated)
OAuth2 four roles: resource owner, client, auth server, resource server
PKCE: code_verifier + code_challenge — prevents auth code interception
OIDC = OAuth2 + identity, ID token is a JWT with user claims
OIDC verification: signature, iss, aud, exp, iat, nonce — all six checks
JWT: header.payload.signature — Base64URL encoded, NOT encrypted
RS256 (asymmetric, preferred) vs HS256 (symmetric, avoid in distributed)
JWT vulnerabilities: alg:none, HS256 key confusion, missing exp, sensitive data
JWT revocation: short expiry + refresh token OR jti blocklist in Redis
mTLS: both sides verify certificates — used for service-to-service auth
SPIFFE/SPIRE: workload identity — short-lived SVIDs, auto-rotated
Zero-Trust: never trust, always verify — identity-based not network-based
Three Zero-Trust principles: verify explicitly, least privilege, assume breach
Microsegmentation: explicit allow rules — compromised pod is contained
Secrets anti-patterns: hardcoded, in Dockerfiles, never rotated, shared across envs
Vault dynamic secrets: TTL-scoped per-request credentials, audit log
OWASP #1 Broken Access Control: IDOR, vertical privilege escalation
OWASP #3 Injection: parameterized queries always — never concatenate user input
OWASP #10 SSRF: block internal IPs, whitelist allowed domains, IMDSv2
Rate limiting for security: per-IP + per-user for brute force prevention
Credential stuffing: rate limit by user_id + CAPTCHA + HIBP check
DDoS: L3/4 Shield, L7 WAF + rate limit, Anycast absorption at edge
✏️ Tasks 1–4 completed (OAuth2, JWT review, Zero-Trust, Stripe security)