Home / Learning / Backend Engineering Roadmap

Backend Engineering Roadmap

A structured, hands-on path from web fundamentals to production-grade distributed systems — with C/C++ examples, concept checklists, and interactive progress tracking.

8Phases
80Concepts
8Code Examples
Depth
Overall Progress
0% 0 of 80 concepts checked
Ph0
How the Web Works
Prerequisite No Prereqs 📄 M01 Notes 📄 M02 Notes
0/8 checked 0%
  • DNS resolution: recursive vs iterative queries, TTL, caching chain (browser → OS → recursive resolver → root nameserver → TLD → authoritative)
  • TCP three-way handshake: SYN → SYN-ACK → ACK; connection teardown FIN/FIN-ACK/ACK; RST for abrupt close
  • TLS 1.3 handshake: ClientHello (supported ciphers + key_share), ServerHello + Certificate + CertificateVerify, Finished; ECDHE forward secrecy
  • HTTP/1.1: persistent connections (Keep-Alive), pipelining, head-of-line blocking at TCP layer
  • HTTP/2: binary framing, multiplexing (multiple streams over single TCP), HPACK header compression, server push; still has TCP HOL blocking
  • HTTP/3 + QUIC: runs over UDP, built-in TLS 1.3, independent streams (no HOL blocking), 0-RTT resumption
  • Web server accept loop: listen socket + SO_REUSEADDR/SO_REUSEPORT, accept() blocks until client connects; thread-per-request (Apache) vs event loop (Nginx/epoll)
  • Backend request lifecycle: accept → parse HTTP → route to handler → middleware chain → business logic → DB query → serialize response → send
TCP/IP TLS 1.3 HTTP/1.1 HTTP/2 HTTP/3 QUIC DNS Wireshark
/* Minimal TCP server skeleton — illustrates accept loop */
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    int srv = socket(AF_INET, SOCK_STREAM, 0);

    int opt = 1;
    setsockopt(srv, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_family      = AF_INET;
    addr.sin_port        = htons(8080);
    addr.sin_addr.s_addr = INADDR_ANY;

    bind(srv, (struct sockaddr *)&addr, sizeof(addr));
    listen(srv, SOMAXCONN);   /* SOMAXCONN = OS backlog limit */

    while (1) {
        int client = accept(srv, NULL, NULL);  /* blocks until client connects */
        /* hand off: thread-per-request -> pthread_create()  */
        /*           event loop          -> epoll_ctl(ADD)   */
        close(client);
    }
}
#ConceptCategory
1DNS resolution and caching chain (resolver → root → TLD → authoritative)Network
2TCP 3-way handshake and connection teardown (FIN/RST)Network
3TLS 1.3 handshake: ECDHE key exchange and forward secrecySecurity
4HTTP/1.1: persistent connections, pipelining, HOL blockingHTTP
5HTTP/2: multiplexing, binary framing, HPACK compressionHTTP
6HTTP/3 + QUIC: UDP-based, independent streams, 0-RTTHTTP
7Web server accept loop: thread-per-request vs event loopServer
8Backend request lifecycle: accept → route → middleware → handler → DB → respondServer
Ph1
API Design & Contracts
Foundational Requires Ph0 📄 M03 Notes 📄 M04 Notes 📄 M05 Notes
0/10 checked 0%
  • REST principles: resources as nouns (not verbs), stateless client-server, uniform interface, cacheable responses, layered system, optional HATEOAS
  • URL & versioning: plural nouns (/users not /user), nested resources (/users/42/orders), versioning strategies — URI prefix (/v1/), Accept header (application/vnd.api+json;version=1), query param (?version=1)
  • HTTP method semantics: GET/HEAD (safe + idempotent), PUT/DELETE (idempotent, not safe), POST (neither), PATCH (partial update, should be idempotent in practice)
  • Status codes: 200 OK, 201 Created, 204 No Content, 301/302/304 redirects, 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable Entity, 429 Too Many Requests, 500/502/503/504
  • Request/response shaping: direct payload vs envelope ({data, meta, links}), consistent error objects, snake_case vs camelCase field naming
  • Pagination: offset+limit (simple, but skips/duplicates on concurrent writes), cursor-based (stable, no skips), keyset pagination (most scalable); include total_count, next_cursor in response
  • Error standard: RFC 7807 Problem Details — type (URI), title, status, detail, instance fields; consistent error envelope across all endpoints
  • OpenAPI/Swagger: spec-first design philosophy, YAML schema, $ref for reusable components, code generation for servers (stub) and clients (SDK)
  • gRPC: Protocol Buffers IDL (syntax = "proto3"), service + rpc definitions, unary vs client-streaming vs server-streaming vs bidirectional-streaming; when to prefer (internal services, streaming, strong typing)
  • GraphQL: schema-first (SDL), resolvers, queries (read) vs mutations (write) vs subscriptions (realtime), N+1 problem (DataLoader batching solution)
REST gRPC GraphQL OpenAPI Protobuf Swagger RFC 7807
/* user.proto — gRPC service definition */
// syntax = "proto3";
//
// service UserService {
//   rpc GetUser (GetUserRequest)  returns (UserResponse);          // unary
//   rpc WatchUser (GetUserRequest) returns (stream UserResponse);  // server-streaming
// }
//
// message GetUserRequest { string user_id = 1; }
//
// message UserResponse {
//   string user_id    = 1;
//   string username   = 2;
//   string email      = 3;
//   int64  created_at = 4;   // Unix timestamp
// }

/* Minimal HTTP/1.1 response builder in C */
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void send_json(int fd, int status, const char *body) {
    char header[512];
    int  body_len = (int)strlen(body);
    snprintf(header, sizeof(header),
        "HTTP/1.1 %d OK\r\n"
        "Content-Type: application/json\r\n"
        "Content-Length: %d\r\n"
        "Connection: close\r\n"
        "\r\n",
        status, body_len);
    write(fd, header, strlen(header));
    write(fd, body,   body_len);
}
#ConceptCategory
1REST constraints: statelessness, uniform interface, resource namingREST
2URL design: plural nouns, nesting, versioning strategies (/v1/, header, query)REST
3HTTP method idempotency: GET/PUT/DELETE vs POST/PATCH semanticsREST
4HTTP status code families and when to use each (2xx/3xx/4xx/5xx)REST
5Pagination: offset vs cursor vs keyset — tradeoffs for eachREST
6RFC 7807 Problem Details: type, title, status, detail, instanceAPI Design
7OpenAPI spec-first design and code generation workflowAPI Design
8gRPC: Protobuf IDL, service definition, 4 streaming modesgRPC
9When to choose gRPC over REST (internal, streaming, strong typing)gRPC
10GraphQL: schema, resolvers, N+1 problem and DataLoader solutionGraphQL
Ph2
Databases & Storage
Core Requires Ph1 📄 M06 Notes 📄 M07 Notes
0/12 checked 0%
  • Relational schema design: normalization (1NF removes repeating groups, 2NF removes partial dependencies, 3NF removes transitive dependencies), ERD, foreign key constraints, check constraints
  • Indexes: B-tree (default, ordered, range queries), hash (equality only), composite (left-prefix rule), covering (index-only scan), partial/filtered; index selectivity; write amplification tradeoff
  • Query plans: EXPLAIN ANALYZE (actual rows, actual time), sequential scan vs index scan vs index-only scan, join algorithms (nested loop, hash join, merge join), planner statistics
  • Transactions & ACID: Atomicity (all-or-nothing), Consistency (invariants preserved), Isolation (concurrent txns don't interfere), Durability (committed = persisted to WAL/disk)
  • Isolation levels: Read Uncommitted (dirty reads), Read Committed (default PostgreSQL), Repeatable Read (no phantom in MySQL InnoDB via MVCC), Serializable (SSI in PostgreSQL); phenomena: dirty read, non-repeatable read, phantom read
  • Deadlocks: detection (wait-for graph cycle), prevention (lock ordering — always acquire locks in same order), lock timeout (lock_timeout = '2s' in PostgreSQL), SKIP LOCKED for queue patterns
  • NoSQL taxonomy: document (MongoDB — flexible schema, nested objects), key-value (Redis — sub-ms latency), wide-column (Cassandra — write-optimized, partitioned by key), time-series (InfluxDB), graph (Neo4j); choose by access pattern
  • Redis data structures: string (counters, cache), hash (object fields), list (queues, stacks), set (unique members), sorted set (leaderboards, rate limiting), stream (event log); each with O() complexity
  • Caching patterns: cache-aside/lazy loading (app reads cache first, on miss reads DB and populates cache), read-through (cache fetches from DB), write-through (write to cache + DB sync), write-behind (async DB write)
  • Redis advanced: persistence modes (RDB — snapshot at intervals, AOF — append-only log, both for durability), pub/sub (fire-and-forget), Lua scripting (atomic multi-command), rate limiting with INCR+EXPIRE
  • Connection pooling: why (TCP + auth handshake cost per connection), Little's Law (avg connections = arrival_rate × avg_latency), pgBouncer modes (session/transaction/statement), pool exhaustion and backpressure
  • Database migrations: versioned sequential scripts (Flyway/Liquibase pattern), forward-only vs rollback scripts, zero-downtime techniques: expand-contract (add column nullable → backfill → add NOT NULL → drop old column)
PostgreSQL MySQL Redis MongoDB Cassandra pgBouncer hiredis libpq
/* PostgreSQL via libpq — parameterized query (prevents SQL injection) */
#include <libpq-fe.h>
#include <stdio.h>

void fetch_user(PGconn *conn, const char *user_id) {
    const char *params[1] = { user_id };
    PGresult *res = PQexecParams(conn,
        "SELECT id, name, email FROM users WHERE id = $1",
        1,    /* nParams */
        NULL, /* paramTypes  (let server infer) */
        params, NULL, NULL,
        0     /* result format: text */
    );
    if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0) {
        printf("id=%-6s  name=%-20s  email=%s\n",
            PQgetvalue(res, 0, 0),
            PQgetvalue(res, 0, 1),
            PQgetvalue(res, 0, 2));
    }
    PQclear(res);
}

/* Redis cache-aside via hiredis */
#include <hiredis/hiredis.h>
#include <string.h>

/* Returns cached JSON string or NULL (caller must free reply) */
redisReply *get_user_cached(redisContext *rc, PGconn *pg,
                             const char *user_id)
{
    char key[64];
    snprintf(key, sizeof(key), "user:%s", user_id);

    redisReply *r = redisCommand(rc, "GET %s", key);
    if (r && r->type == REDIS_REPLY_STRING)
        return r;   /* cache HIT */

    freeReplyObject(r);

    /* cache MISS — query DB, then SET with 5-min TTL */
    /* fetch_user(pg, user_id) -> serialize to JSON ->  */
    /* redisCommand(rc, "SET %s %s EX 300", key, json_val) */
    return NULL;
}
#ConceptCategory
1Relational schema normalization: 1NF, 2NF, 3NF and when to denormalizeSQL
2Index types: B-tree, hash, composite, covering, partial; left-prefix ruleSQL
3Query plans: EXPLAIN ANALYZE, sequential scan vs index scanSQL
4ACID properties and what each guaranteesTransactions
5Isolation levels: RC, RR, Serializable; dirty/phantom/non-repeatable readsTransactions
6Deadlocks: detection, lock ordering prevention, SKIP LOCKEDTransactions
7NoSQL taxonomy: document, key-value, wide-column, time-series, graph — when to useNoSQL
8Redis data structures and time complexity of eachRedis
9Caching patterns: cache-aside, read-through, write-through, write-behindCaching
10Redis persistence: RDB vs AOF; pub/sub; Lua atomicityRedis
11Connection pooling: Little's Law, pgBouncer modes, pool exhaustionPerformance
12Zero-downtime migrations: expand-contract patternMigrations
Ph3
Authentication & Authorization
Core Requires Ph1 📄 M09 Notes
0/8 checked 0%
  • Session-based auth: server stores session state (in-memory or Redis), session ID in HttpOnly+Secure cookie, session fixation attack (regenerate session ID on login), CSRF protection (SameSite=Strict or CSRF token)
  • JWT structure: three base64url-encoded sections — header (alg, typ), payload (claims), signature; standard claims: iss (issuer), sub (subject), aud (audience), exp (expiry), nbf (not before), iat (issued at), jti (JWT ID for revocation)
  • JWT signing algorithms: HS256 (HMAC-SHA256, shared secret — symmetric, all services need secret), RS256 (RSA — private key signs, public key verifies — asymmetric, safe to distribute public key), ES256 (ECDSA, smaller keys than RSA)
  • Access + refresh token pattern: short-lived access token (15min–1hr, stateless validation), long-lived refresh token (7–30 days, stored in DB, one-time-use rotation, allows revocation)
  • OAuth2 flows: Authorization Code + PKCE (for SPAs and mobile — code verifier/challenge prevents interception), Client Credentials (machine-to-machine, no user), Device Code (CLI/TV apps — user visits URL on phone)
  • API Keys: generation (crypto/rand CSPRNG → hex or base62 encoding), never store plaintext (store SHA-256 hash + prefix for lookup), scoping to specific resources/operations, key rotation strategy
  • RBAC vs ABAC: Role-Based (user has role, role has permissions — simple, coarse-grained), Attribute-Based (policy: ALLOW if subject.dept == resource.dept AND action == "read" — flexible, complex); hybrid (RBAC for coarse, ABAC for fine-grained)
  • Password storage: why fast hashes are wrong (MD5/SHA256: billions/sec on GPU), bcrypt (configurable cost factor, ~100ms target), Argon2id (OWASP recommended — memory-hard, time-hard, side-channel resistant), always timing-safe comparison (constant-time memcmp)
JWT OAuth2 OpenSSL bcrypt Argon2 Redis (sessions) PKCE
/* JWT HMAC-SHA256 signature verification (OpenSSL) */
#include <openssl/hmac.h>
#include <openssl/evp.h>
#include <string.h>
#include <stdio.h>

/* Compare two byte arrays in constant time to prevent timing attacks */
static int const_time_cmp(const unsigned char *a,
                           const unsigned char *b, size_t len) {
    unsigned char diff = 0;
    for (size_t i = 0; i < len; i++)
        diff |= a[i] ^ b[i];
    return diff == 0;
}

/* Verify HS256: header_payload = "base64url(hdr).base64url(payload)" */
int jwt_verify_hs256(const char    *header_payload,
                     const unsigned char *expected_sig, size_t sig_len,
                     const unsigned char *secret,       size_t secret_len)
{
    unsigned char digest[EVP_MAX_MD_SIZE];
    unsigned int  digest_len = 0;

    HMAC(EVP_sha256(),
         secret,   (int)secret_len,
         (const unsigned char *)header_payload, strlen(header_payload),
         digest,  &digest_len);

    if (digest_len != sig_len) return 0;
    return const_time_cmp(digest, expected_sig, digest_len);
}
#ConceptCategory
1Session-based auth: HttpOnly cookie, Redis-backed sessions, session fixationAuth
2JWT structure: header.payload.signature, standard claims (iss, sub, exp, jti)JWT
3JWT signing: HS256 vs RS256 vs ES256 — symmetric vs asymmetric tradeoffsJWT
4Access + refresh token pattern: rotation, revocation, short-lived access tokensJWT
5OAuth2 flows: Authorization Code + PKCE, Client Credentials, Device CodeOAuth2
6API Keys: CSPRNG generation, hashing at rest, scoping, rotationAPI Security
7RBAC vs ABAC: coarse-grained roles vs attribute-based policy evaluationAuthorization
8Password storage: bcrypt cost factor, Argon2id memory-hardness, timing-safe compareSecurity
Ph4
Concurrency & Performance
Intermediate Requires Ph0, Ph2 📄 M11 Notes
0/10 checked 0%
  • Threading models: thread-per-request (simple, high memory — 8KB stack × 10K = 80MB+), thread pool with bounded queue (Apache worker MPM), event loop + I/O multiplexing (Nginx, Node.js), green threads/goroutines (M:N userspace scheduling)
  • Synchronization primitives: mutex (exclusive lock, binary), RW lock (multiple concurrent readers OR single writer), semaphore (counting lock, rate limiting), condition variable (wait for predicate — always pair with mutex), spinlock (busy-wait, only for very short critical sections on multi-core)
  • Lock-free programming: compare-and-swap (CAS) — atomically: if (*ptr == expected) { *ptr = desired; return true; }, ABA problem (use versioned pointers), GCC __atomic builtins (__atomic_compare_exchange_n), C11 stdatomic.h
  • I/O multiplexing evolution: select (FD_SET bitmap, 1024 fd limit), poll (no fd limit, linear scan), epoll (Linux — O(1) notification, edge-triggered ET vs level-triggered LT, epoll_create1/epoll_ctl/epoll_wait), io_uring (Linux 5.1+ — async submit+complete ring buffers, zero-copy, no syscall per I/O)
  • C10K problem: 10,000 concurrent connections — why thread-per-request fails (OS scheduling overhead, stack memory), how epoll event loop solves it (single thread handles thousands of FDs)
  • In-process caching: LRU eviction (doubly-linked list + hash map = O(1) get/put), LFU (min-heap of frequency buckets), cache capacity planning (hot data << cold data)
  • Distributed caching with Redis: cache stampede (thundering herd when TTL expires simultaneously — mutex lock, probabilistic early expiry, background refresh), hotspot key sharding, client-side consistent hashing
  • Connection pool management: pool exhaustion (queue vs reject vs timeout), health checks on idle connections (keepalive probe or validation query), backpressure signals upstream
  • Load balancing algorithms: round-robin (uniform distribution), weighted round-robin (heterogeneous backends), least-connections (for variable request durations), IP hash (session stickiness, avoid with horizontal scaling), consistent hashing (minimal key redistribution when nodes added/removed)
  • Horizontal scaling design: stateless services (no server-side session), shared-nothing architecture, externalizing state (Redis, DB), idempotent operations (safe to retry), eventual consistency tradeoffs
epoll io_uring pthreads stdatomic.h Redis pgBouncer
/* epoll edge-triggered event loop skeleton (Linux) */
#include <sys/epoll.h>
#include <sys/socket.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

#define MAX_EVENTS 128

static void set_nonblocking(int fd) {
    int flags = fcntl(fd, F_GETFL, 0);
    fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}

void run_event_loop(int listen_fd) {
    int epfd = epoll_create1(EPOLL_CLOEXEC);

    struct epoll_event ev;
    ev.events  = EPOLLIN;
    ev.data.fd = listen_fd;
    epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd, &ev);

    struct epoll_event events[MAX_EVENTS];

    while (1) {
        int n = epoll_wait(epfd, events, MAX_EVENTS, -1 /* block forever */);

        for (int i = 0; i < n; i++) {
            if (events[i].data.fd == listen_fd) {
                /* New connection */
                int client = accept(listen_fd, NULL, NULL);
                set_nonblocking(client);

                ev.events  = EPOLLIN | EPOLLET;  /* edge-triggered */
                ev.data.fd = client;
                epoll_ctl(epfd, EPOLL_CTL_ADD, client, &ev);
            } else {
                /* Data available — handle_client(events[i].data.fd) */
                /* With ET: must read until EAGAIN */
            }
        }
    }
}
#ConceptCategory
1Threading models: thread-per-request vs thread pool vs event loop vs green threadsConcurrency
2Mutex, RW lock, semaphore, condition variable — when to use eachConcurrency
3Lock-free CAS: compare-and-swap, ABA problem, GCC __atomic builtinsConcurrency
4I/O multiplexing: select → poll → epoll (ET vs LT) → io_uring evolutionI/O
5C10K problem: why threads fail at scale, how epoll solves itI/O
6In-process caching: LRU (linked list + hash map), LFU, eviction policiesCaching
7Cache stampede: thundering herd, mutex lock, probabilistic early expiryCaching
8Connection pool exhaustion: queue vs reject, backpressure, health checksPerformance
9Load balancing algorithms: round-robin, least-conn, consistent hashingScaling
10Stateless design: externalizing state for horizontal scalingScaling
Ph5
Event-Driven Architecture
Intermediate Requires Ph2, Ph4 📄 M13 Notes
0/10 checked 0%
  • Why event-driven: temporal decoupling (producer/consumer run independently), fanout (one event → many consumers), audit log (full history replayable), reduces synchronous blocking chains, enables eventual consistency
  • Message queues vs event streams: RabbitMQ (work queue model — message consumed and deleted, at-most-once or at-least-once via acks, competing consumers, dead-letter exchange) vs Kafka (persistent log — messages retained, consumer groups replay from offset, unlimited retention)
  • Kafka internals: topic partitioned across brokers, each partition is an ordered immutable log; leader partition + replicas (In-Sync Replicas ISR); producer assigns partition (key hash or round-robin); consumer group — each partition consumed by exactly one consumer in group; offset committed by consumer
  • Kafka delivery semantics: at-most-once (acks=0, fire-and-forget), at-least-once (acks=all + retry — may duplicate), exactly-once (idempotent producer + transactions — enable.idempotence=true + transactional.id)
  • RabbitMQ patterns: direct exchange (routing key match), topic exchange (routing key pattern *.error), fanout exchange (broadcast to all bound queues), headers exchange; dead-letter exchange (DLX) for failed messages; message TTL; priority queues
  • Saga pattern: managing distributed transactions without 2PC; orchestration (central Saga Orchestrator sends commands, receives events, handles compensations), choreography (each service reacts to events and emits new events); compensating transactions roll back completed steps
  • Outbox pattern: write event to outbox table in same DB transaction as business data (atomicity), separate Relay/CDC process polls outbox and publishes to broker, mark as published; prevents lost events on crash between DB write and broker publish
  • CQRS (Command Query Responsibility Segregation): write side (commands mutate state, normalized DB optimized for writes), read side (queries return projections, denormalized read model optimized for reads); sync via domain events or CDC; eventual consistency between models
  • Event Sourcing: system state = ordered log of immutable domain events (not current state snapshot); reconstruct any past state by replaying events; snapshots for performance (don't replay full history); projections for derived read models; event schema versioning challenge
  • Idempotent consumers: natural idempotency (PUT/DELETE — repeated calls have same effect), deduplication table (store processed event IDs, reject duplicates), atomic check-and-process with DB transaction; combine with outbox for exactly-once end-to-end
Kafka RabbitMQ librdkafka Apache Pulsar NATS
/* Kafka producer using librdkafka (C client) */
#include <librdkafka/rdkafka.h>
#include <string.h>
#include <stdio.h>

static void delivery_cb(rd_kafka_t *rk, const rd_kafka_message_t *msg,
                         void *opaque) {
    (void)rk; (void)opaque;
    if (msg->err)
        fprintf(stderr, "Delivery failed: %s\n",
                rd_kafka_err2str(msg->err));
}

void produce_event(const char *brokers, const char *topic,
                   const char *key,    const char *value) {
    char errstr[512];

    rd_kafka_conf_t *conf = rd_kafka_conf_new();
    rd_kafka_conf_set(conf, "bootstrap.servers", brokers,
                      errstr, sizeof(errstr));
    rd_kafka_conf_set_dr_msg_cb(conf, delivery_cb);

    rd_kafka_t *rk = rd_kafka_new(RD_KAFKA_PRODUCER, conf,
                                  errstr, sizeof(errstr));
    rd_kafka_topic_t *rkt = rd_kafka_topic_new(rk, topic, NULL);

retry:
    if (rd_kafka_produce(rkt,
            RD_KAFKA_PARTITION_UA,    /* auto-select partition by key hash */
            RD_KAFKA_MSG_F_COPY,      /* copy payload into rdkafka */
            (void *)value, strlen(value),
            key, strlen(key),
            NULL) == -1) {
        if (rd_kafka_last_error() == RD_KAFKA_RESP_ERR__QUEUE_FULL) {
            rd_kafka_poll(rk, 100);   /* drain delivery queue */
            goto retry;
        }
    }

    rd_kafka_flush(rk, 10000);        /* wait up to 10s for delivery */
    rd_kafka_topic_destroy(rkt);
    rd_kafka_destroy(rk);
}
#ConceptCategory
1Why events: temporal decoupling, audit log, fanout, replay capabilityArchitecture
2Message queue vs event stream: RabbitMQ (work queue, delete on consume) vs Kafka (log, retain+replay)Architecture
3Kafka internals: topics, partitions, offsets, ISR, consumer group rebalancingKafka
4Delivery semantics: at-most-once, at-least-once, exactly-once (idempotent producer)Kafka
5RabbitMQ patterns: exchange types, DLX, message TTLEvents
6Saga pattern: orchestration vs choreography, compensating transactionsPatterns
7Outbox pattern: atomic write to outbox table, relay publishes to brokerPatterns
8CQRS: separate write model (commands) from read model (projections)Patterns
9Event Sourcing: state as event log, snapshots, projections, schema versioningPatterns
10Idempotent consumers: dedup table, atomic check-and-processPatterns
Ph6
Microservices & Infrastructure
Advanced Requires Ph3, Ph5 📄 M15 Notes
0/10 checked 0%
  • Monolith vs microservices decision: start with modular monolith, split when team topology demands it (Conway's Law), bounded contexts (DDD) define service boundaries; microservices add operational complexity — don't split prematurely
  • Strangler Fig pattern: incrementally replace monolith — route specific URL paths to new service at API gateway, coexist with monolith during migration, deprecate monolith module by module; avoids big-bang rewrite risk
  • Inter-service communication strategy: sync REST/gRPC (simple, tight coupling, propagates latency) vs async events (loose coupling, eventual consistency, harder to debug); request-reply over async via correlation ID in message header
  • API Gateway responsibilities: single entry point, path-based routing to backend services, authentication/authorization offload (validate JWT before forwarding), rate limiting and throttling, SSL termination, request aggregation (backend for frontend pattern), canary traffic splitting
  • Service discovery: client-side (service queries registry like Consul/Eureka, client chooses instance — more control), server-side (load balancer queries registry — simpler client), DNS-based (Kubernetes Services use kube-dns)
  • Circuit breaker: closed state (normal, count failures), open state (fail fast immediately — no calls to unhealthy service, prevents cascade), half-open state (allow probe requests to test recovery); bulkhead pattern (isolate resource pools per service)
  • Docker best practices for C/C++: multi-stage build (Stage 1: gcc:13 builder compiles binary, Stage 2: debian:slim runtime copies binary — minimal image size), non-root user (useradd -r), .dockerignore (exclude build artifacts), pin base image versions, ENTRYPOINT vs CMD
  • Kubernetes fundamentals: Pod (smallest deployable unit, co-located containers), Deployment (manages ReplicaSet, rolling updates, rollback), Service (stable DNS name + ClusterIP load balancing), Ingress (HTTP/S routing + TLS termination), ConfigMap (non-secret config), Secret (base64-encoded credentials), liveness probe (restart if unhealthy), readiness probe (remove from Service endpoints if not ready)
  • CI/CD pipeline stages: lint → unit test → integration test → build OCI image → push to registry → deploy to staging → smoke test → deploy to production; blue-green (two identical environments, instant cutover); canary (route 5% → 20% → 100% traffic to new version)
  • 12-Factor App: I-Codebase (one repo, many deploys), II-Dependencies (explicitly declared), III-Config (env vars, not hardcoded), IV-Backing services (attached resources, swap without code change), V-Build/release/run (strict separation), VI-Processes (stateless, share nothing), VII-Port binding, VIII-Concurrency (scale out via process model), IX-Disposability (fast startup, graceful shutdown), X-Dev/prod parity, XI-Logs (stdout, not files), XII-Admin processes
Docker Kubernetes Consul Nginx Helm GitHub Actions ArgoCD
# -- Stage 1: Build (fat image with full toolchain) --
FROM gcc:13 AS builder
WORKDIR /src

# Copy source and build system first (layer caching)
COPY Makefile ./
COPY src/     ./src/

# Build release binary (strip debug symbols)
RUN make release CFLAGS="-O2 -DNDEBUG" && strip bin/server

# -- Stage 2: Minimal runtime --
FROM debian:bookworm-slim
RUN apt-get update \
 && apt-get install -y --no-install-recommends libpq5 ca-certificates \
 && rm -rf /var/lib/apt/lists/*

# Non-root user for security
RUN useradd -r -u 1001 -s /sbin/nologin appuser
WORKDIR /app
COPY --from=builder /src/bin/server .
USER appuser

EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8080/health || exit 1

ENTRYPOINT ["./server"]
#ConceptCategory
1Monolith vs microservices: Conway's Law, bounded contexts, modular monolith firstArchitecture
2Strangler Fig: incremental migration via API gateway routingArchitecture
3Sync vs async inter-service communication: tradeoffs, correlation ID patternCommunication
4API Gateway: routing, auth offload, rate limiting, BFF patternInfra
5Service discovery: client-side (Consul) vs server-side vs DNS-based (K8s)Infra
6Circuit breaker: closed/open/half-open states, bulkhead patternReliability
7Docker multi-stage build for C/C++: builder → slim runtime, non-root userDocker
8Kubernetes: Pod, Deployment, Service, Ingress, liveness vs readiness probeK8s
9CI/CD: pipeline stages, blue-green deployment, canary traffic splittingCI/CD
1012-Factor App: config via env, stateless processes, stdout logsBest Practices
Ph7
Observability & Hardening
Production Requires Ph6 📄 M17 Notes
0/12 checked 0%
  • 3 pillars of observability: logs (discrete events — what happened), metrics (aggregated numeric data — how many/how fast), traces (causal chains across services — why it was slow); each answers different questions; together give full system visibility
  • Structured logging: JSON lines format (one JSON object per line), mandatory fields (timestamp ISO-8601, level, service, trace_id, span_id, message), log levels (DEBUG verbose, INFO normal, WARN degraded, ERROR unexpected failure, FATAL unrecoverable), never log secrets or PII, use correlation/trace IDs to link logs across services
  • Metrics types: counter (monotonically increasing, e.g., http_requests_total — use rate()), gauge (point-in-time value, e.g., memory_usage_bytes, active_connections), histogram (bucketed distribution, e.g., request_duration_seconds — use histogram_quantile() for p99)
  • RED method (for services): Rate (requests/second), Errors (error rate %), Duration (latency percentiles p50/p95/p99); USE method (for resources): Utilization (% time busy), Saturation (queue depth, wait time), Errors (device error rate)
  • Prometheus: pull-based scraping (Prometheus polls /metrics endpoint on services), exposition format (# HELP, # TYPE, metric_name{labels} value timestamp), PromQL (rate(http_requests_total[5m]), histogram_quantile(0.99, ...), by(service)), AlertManager for alerting rules and routing
  • Distributed tracing: trace (end-to-end request chain, unique trace_id), span (single operation within trace, span_id + parent_span_id), W3C traceparent header for cross-service propagation, OpenTelemetry SDK (language-agnostic instrumentation, OTLP export to Jaeger/Tempo/Zipkin)
  • Health check endpoints: GET /health/live — liveness probe (is process alive? if fails, Kubernetes restarts container), GET /health/ready — readiness probe (is service ready to serve traffic? if fails, removed from Service endpoints); startup probe for slow-starting containers
  • Rate limiting algorithms: token bucket (bucket refills at rate r, allow bursts up to capacity b — bursty traffic ok), sliding window log (store timestamps of all requests, exact but memory O(requests)), sliding window counter (approximate, memory O(1), compromise); implement at API gateway (global) and per-service (defense in depth)
  • OWASP Top 10 for backends: SQL injection (parameterized queries only — never string concat), command injection (avoid shell=True / system(), use execv), SSRF (Server-Side Request Forgery — allowlist outbound URLs), broken access control (check authorization on every request, not just auth), security misconfiguration (disable debug endpoints in prod, no default credentials), insecure deserialization (validate and sanitize all deserialized input)
  • Input validation: allowlist over denylist (define what is allowed, reject everything else), validate at trust boundaries only (never trust client input), size limits (prevent DoS via large payloads — max body size), type checking, sanitize before SQL/shell/HTML context
  • Secrets management: never in source code or Docker images (scan with truffleHog/gitleaks), environment variables (basic, visible in /proc/PID/environ — acceptable for containers), HashiCorp Vault (dynamic secrets with TTL + auto-rotation, audit log, fine-grained policies), AWS Secrets Manager / GCP Secret Manager; secret rotation strategy
  • Graceful shutdown: catch SIGTERM (Kubernetes sends this before SIGKILL after terminationGracePeriodSeconds), stop accepting new connections (close listen socket or remove from load balancer), drain in-flight requests (atomic counter), close DB connection pool, deregister from service discovery, log completion; target: shutdown in < terminationGracePeriodSeconds (default 30s)
Prometheus Grafana Jaeger OpenTelemetry HashiCorp Vault AlertManager
/* Graceful shutdown via SIGTERM — C implementation */
#include <signal.h>
#include <stdatomic.h>
#include <stdio.h>
#include <unistd.h>

static atomic_int  in_flight        = 0;
static atomic_bool shutdown_req     = false;

static void handle_sigterm(int sig) {
    (void)sig;
    atomic_store(&shutdown_req, true);
}

void register_signals(void) {
    struct sigaction sa = { .sa_handler = handle_sigterm };
    sigemptyset(&sa.sa_mask);
    sigaction(SIGTERM, &sa, NULL);
    sigaction(SIGINT,  &sa, NULL);   /* also handle Ctrl-C */
}

/* Called at start of each request handler */
void request_begin(void) { atomic_fetch_add(&in_flight, 1); }

/* Called at end of each request handler */
void request_end(void)   { atomic_fetch_sub(&in_flight, 1); }

int main(void) {
    register_signals();
    /* ... start server, accept connections ... */

    /* Main loop — stop accepting when shutdown requested */
    while (!atomic_load(&shutdown_req)) {
        /* accept() new connections */
    }

    /* Drain: wait for all in-flight requests to complete */
    fprintf(stderr, "[shutdown] draining %d in-flight requests\n",
            atomic_load(&in_flight));
    while (atomic_load(&in_flight) > 0)
        usleep(5000);   /* poll every 5ms */

    /* Close DB pools, deregister from service discovery */
    fprintf(stderr, "[shutdown] clean exit\n");
    return 0;
}
#ConceptCategory
13 pillars: logs (events), metrics (aggregates), traces (causal chains) — what each answersObservability
2Structured logging: JSON lines, mandatory fields, log levels, trace_id correlationObservability
3Metric types: counter (rate()), gauge, histogram (histogram_quantile p99)Metrics
4RED method (Rate, Errors, Duration) and USE method (Utilization, Saturation, Errors)Metrics
5Prometheus: pull-based scraping, exposition format, PromQL, AlertManagerMetrics
6Distributed tracing: trace/span model, W3C traceparent header, OpenTelemetryTracing
7Health checks: liveness (restart) vs readiness (remove from LB) vs startup probeReliability
8Rate limiting: token bucket, sliding window log, sliding window counterPerformance
9OWASP Top 10: SQL injection, SSRF, broken access control, security misconfigurationSecurity
10Input validation: allowlist, trust boundaries, size limits, sanitizationSecurity
11Secrets management: Vault dynamic secrets, never in code/images, rotationSecurity
12Graceful shutdown: SIGTERM handler, drain in-flight, close pools, deregisterReliability