M02 — HTTP Deep Dive & Web Server Internals
Phase 0
HTTP request parsing · Header internals · epoll event loop · Middleware pipeline · Trie-based routing · Content negotiation · Chunked transfer · Keep-alive & connection pooling
🔧 What This Module Covers
M01 covered the transport layer (TCP, TLS) and protocol overview. M02 goes deeper into how HTTP actually works at the byte level and how a production web server processes requests — the request pipeline every backend engineer must understand to debug performance issues, write efficient servers, and understand frameworks like Nginx, Express, or Actix.
📋 HTTP/1.1 Request Format — Byte by Byte
An HTTP/1.1 request is plain text over a TCP stream:
POST /orders HTTP/1.1\r\n ← request line: method SP request-target SP HTTP-version CRLF
Host: api.example.com\r\n ← required in HTTP/1.1
Content-Type: application/json\r\n
Content-Length: 47\r\n ← exact byte count of body
Authorization: Bearer eyJhbGci...\r\n
Connection: keep-alive\r\n
\r\n ← blank line = end of headers
{"customer_id":"c-42","items":[{"id":"p-7"}]} ← body (47 bytes)
Key structural rules:
- Request line and each header line ends with
\r\n(CRLF) - Header section ends with an empty line (
\r\n\r\n) - Header name is case-insensitive; value is case-sensitive (mostly)
- Body length determined by
Content-LengthorTransfer-Encoding: chunked
📋 HTTP/1.1 Response Format
HTTP/1.1 201 Created\r\n ← status line: version SP status-code SP reason CRLF
Content-Type: application/json\r\n
Content-Length: 33\r\n
Location: /orders/ord-9821\r\n ← URL of new resource (201 response)
Cache-Control: no-store\r\n
\r\n
{"id":"ord-9821","status":"pending"}
HTTP is a text protocol — both request and response are human-readable ASCII (headers). The body can be binary. This is why HTTP/2 moved to binary framing — text parsing is slower and more fragile.
⚙️ HTTP/1.1 Request Parser State Machine
A production HTTP parser is a state machine. It reads data from the TCP stream in chunks (not necessarily aligned to request boundaries) and must handle:
- Partial reads — CRLF split across two recv() calls
- Pipelining — multiple requests in one TCP read buffer
- Slowloris attack — client sends headers one byte at a time
- Header injection — values containing CRLF sequences
PARSE_REQUEST_LINE → reads until first CRLF
extract method, URI, HTTP-version
validate method ∈ {GET,POST,PUT,DELETE,PATCH,HEAD,OPTIONS}
↓
PARSE_HEADERS → reads until CRLFCRLF
for each line: split on first ': '
normalize header name to lowercase
check for Content-Length or Transfer-Encoding
enforce max_header_count (prevent DoS)
enforce max_header_value_length
↓
PARSE_BODY
if Content-Length:
read exactly N bytes
elif Transfer-Encoding: chunked:
read chunk-size CRLF chunk-data CRLF, repeat until 0 CRLF
else:
no body (GET, HEAD, DELETE)
↓
REQUEST_COMPLETE → dispatch to router
🧩 Chunked Transfer Encoding
Used when the total body size is unknown at send time (streaming, compression). Each chunk is prefixed with its size in hex:
Transfer-Encoding: chunked
\r\n
1a\r\n ← chunk-size: 26 bytes (hex)
abcdefghijklmnopqrstuvwxyz\r\n ← chunk data
5\r\n
hello\r\n
0\r\n ← final chunk: size = 0 signals end
\r\n ← trailing CRLF after final chunk
Never trust
Content-Length if Transfer-Encoding: chunked is also set. Per RFC 7230, chunked wins and Content-Length must be removed. A mismatch is a potential HTTP request smuggling vector.🔒 HTTP Request Smuggling
When a frontend proxy (CDN, load balancer) and backend server disagree on where one HTTP request ends and the next begins, an attacker can "smuggle" a prefix of a subsequent request past security controls.
CL.TE attack: frontend uses Content-Length, backend uses Transfer-Encoding:
CL.TE attack: frontend uses Content-Length, backend uses Transfer-Encoding:
POST / HTTP/1.1\r\n
Content-Length: 13\r\n ← frontend reads 13 bytes: "0\r\n\r\nGET /admin"
Transfer-Encoding: chunked\r\n ← backend: reads chunk 0 → end, then starts "GET /admin" as new request
\r\n
0\r\n
\r\n
GET /admin
Prevention: normalize all requests at the proxy; reject ambiguous requests; use HTTP/2 end-to-end (binary framing eliminates this class).
📏 Parser Security: Limits to Enforce
| Limit | Nginx Default | Why |
|---|---|---|
| Request line max length | 8KB | Prevent buffer overflow in URI parsing |
| Max header count | ~100 | Prevent CPU DoS from O(N) header processing |
| Max single header value | 8KB | Prevent memory DoS |
| Max body size | 1MB | Prevent disk/memory exhaustion |
| Header read timeout | 60s | Prevent Slowloris (slow header attack) |
| Body read timeout | 60s | Prevent slow POST attacks |
📋 Essential Request Headers
| Header | Purpose | Notes |
|---|---|---|
Host | Virtual hosting — which domain is being requested | Required in HTTP/1.1; enables multiple sites on one IP |
Accept | Content types client accepts: application/json, text/html;q=0.9 | q= is quality factor (0–1). Server picks best match. |
Accept-Encoding | Compression algorithms: gzip, br, deflate | Server compresses response body if supported |
Accept-Language | Preferred languages: en-US,en;q=0.8 | Used for i18n |
Content-Type | Body media type: application/json; charset=utf-8 | Required when body is present |
Authorization | Credentials: Bearer {token}, Basic {b64} | Never in URL (logged by proxies) |
If-None-Match | Conditional GET — send ETag from previous response | Server returns 304 if unchanged → saves bandwidth |
If-Modified-Since | Conditional GET by date | Weaker than ETag (1-second granularity) |
X-Forwarded-For | Original client IP behind a proxy/LB | Rightmost non-trusted IP is the last known client |
X-Request-Id | Request correlation ID | Generate at edge; propagate through all services |
🗜️ Content Negotiation
Server selects the best response format based on
Accept header:
/* Client request */
Accept: application/json;q=1.0,
application/xml;q=0.8,
text/html;q=0.5
/* Server algorithm */
for each supported_type in server_types:
find matching accept entry
score = q * specificity
pick highest score
→ response: application/json (q=1.0 wins)
If no match: return 406 Not Acceptable.
📦 Response Caching Headers
| Header | Meaning |
|---|---|
Cache-Control: max-age=3600 | Cache for 1 hour |
Cache-Control: no-cache | Revalidate before using cached copy |
Cache-Control: no-store | Never cache (auth, sensitive) |
Cache-Control: private | Browser-only, not CDN |
ETag: "abc123" | Version token for conditional GET |
Last-Modified | Date-based conditional GET |
Vary: Accept-Encoding | Cache by encoding variant |
🔄 Keep-Alive & Connection Pooling
HTTP/1.1 defaults to persistent connections (
Connection: keep-alive). The TCP connection is reused for multiple requests:
Without Keep-Alive: TCP handshake + TLS handshake per request (~100ms overhead)
GET /a [TCP 3-way] [TLS handshake] → response → [TCP FIN]
GET /b [TCP 3-way] [TLS handshake] → response → [TCP FIN]
With Keep-Alive: single TCP+TLS handshake for N requests
[TCP 3-way] [TLS handshake]
GET /a → response
GET /b → response
GET /c → response
[TCP FIN when idle timeout or max-requests reached]
Server-side controls:
keepalive_timeout 65s— close idle connection after 65skeepalive_requests 1000— max requests per connection (prevent memory leak)Connection: close— explicitly close after this response
HTTP/2 solves keep-alive inefficiency better — it multiplexes all requests over one connection without head-of-line blocking between requests.
📏 CORS — Cross-Origin Resource Sharing
Browsers block cross-origin requests by default (same-origin policy). CORS headers tell the browser which cross-origin requests are allowed.
| Header | Example Value | Meaning |
|---|---|---|
Access-Control-Allow-Origin | https://app.example.com | Allowed origin (or * for public APIs) |
Access-Control-Allow-Methods | GET,POST,PUT,DELETE | Allowed HTTP methods |
Access-Control-Allow-Headers | Authorization,Content-Type | Allowed request headers |
Access-Control-Max-Age | 86400 | Cache preflight result for 24h |
Access-Control-Allow-Credentials | true | Allow cookies/auth headers cross-origin |
Preflight: browsers send an
OPTIONS request before any non-simple cross-origin request. Your server must respond to OPTIONS with CORS headers, or the actual request is blocked. Never set Access-Control-Allow-Origin: * with Allow-Credentials: true — that's a security mistake browsers reject.⚡ epoll vs Thread-Per-Request: The C10K Problem
In 1999, Dan Kegel posed the "C10K problem" — can a single server handle 10,000 simultaneous connections? Thread-per-request breaks at scale because:
- Each thread = 8MB stack (default) → 10K threads = 80GB RAM just for stacks
- Context switch overhead between 10K threads is CPU-expensive
- Most threads are blocked on I/O — wasted resources
epoll — one thread, thousands of connections, only active on I/O events.
🔄 epoll Edge-Triggered Event Loop
| syscall | Purpose |
|---|---|
epoll_create1(0) | Create epoll instance, returns fd |
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &event) | Register fd to watch |
epoll_wait(epfd, events, MAX, timeout_ms) | Block until events ready, returns count |
Level-triggered (LT) vs Edge-triggered (ET):
- LT (default): epoll_wait returns event repeatedly while data available — simpler, forgiving of partial reads
- ET: epoll_wait returns event once per state change — you must read until EAGAIN or data is lost. Higher performance, requires non-blocking sockets + retry loops.
Event Loop
1. Create listen socket + set O_NONBLOCK
2. Create epoll fd
3. Register listen socket: EPOLLIN (new connection)
4. Loop:
n = epoll_wait(epfd, events, MAX_EVENTS, -1)
for i in 0..n:
if events[i].fd == listen_fd:
accept() → new client_fd
fcntl(client_fd, F_SETFL, O_NONBLOCK)
epoll_ctl(EPOLL_CTL_ADD, client_fd, EPOLLIN|EPOLLET)
else:
parse_http(events[i].fd)
if request complete:
route_and_dispatch(request)
send_response(events[i].fd)
if !keep_alive: epoll_ctl(DEL); close(fd)
Worker threads: for CPU-bound work, use a thread pool
event loop enqueues work → thread pool executes → posts result back
🏗️ Nginx Architecture: Master + Workers
Nginx uses a master process + N worker processes (one per CPU core):
- Master process: reads config, manages worker lifecycle, handles signals, zero-downtime reload (
nginx -s reload) - Worker process: single-threaded epoll event loop; handles all connections assigned to it
- SO_REUSEPORT: each worker binds to the same port independently; kernel load-balances accept() calls across workers — eliminates accept mutex contention
A single Nginx worker can handle ~10,000+ simultaneous connections because all I/O is non-blocking and the worker never sleeps waiting for one connection's data while others are ready.
🔗 Middleware Pipeline Pattern
A middleware pipeline is an ordered chain of functions where each function can: process the request, modify it, call the next middleware, or short-circuit (return a response without calling next).
Request → [Logger] → [Rate Limiter] → [Auth] → [CORS] → [Body Parser] → [Handler]
↓
Response ← [Logger] ← [Compression] ←─────────────────────────────────── handler result
Each middleware: (ctx, next) → { pre-logic; next(ctx); post-logic }
Short circuit: Rate Limiter returns 429 without calling next()
🌲 Trie-Based HTTP Router
A naive router compares each route pattern sequentially O(N). A trie (radix tree) routes in O(path_depth) — constant for most APIs:
Routes registered:
GET /users
GET /users/:id
POST /users
GET /users/:id/orders
GET /orders/:id
Radix tree:
/ ─┬─ users ─┬─ (GET → list_users_handler)
│ ├─ (POST → create_user_handler)
│ └─ /:id ─┬─ (GET → get_user_handler)
│ └─ /orders (GET → get_user_orders_handler)
└─ orders ─ /:id (GET → get_order_handler)
Path: GET /users/42/orders
match /users → match /:id (capture "42") → match /orders → handler
Path parameters: captured values (id=42) are extracted during trie traversal and placed in the request context.
🔌 Common Middleware Implementations
| Middleware | Responsibilities | Notes |
|---|---|---|
| Logger | Log method, path, status, latency, request_id | Run first (pre) and last (post) to capture full latency |
| Request ID | Generate/propagate X-Request-Id | Set before logger so all logs carry the ID |
| Auth | Validate JWT/session; attach user to context | Short-circuit with 401 if invalid |
| Rate Limiter | Check token bucket; return 429 if over limit | After auth (rate limit by user ID, not IP) |
| CORS | Add Access-Control-* headers; handle OPTIONS preflight | Must run before auth for OPTIONS to pass without credentials |
| Body Parser | Read body bytes; parse JSON/form; attach to context | Enforce size limits here |
| Compression | Gzip/br response if Accept-Encoding matches | Post-handler; skip for small responses (<1KB) |
| Panic Recovery | Catch panics/crashes; return 500 | Always the outermost middleware |
🔢 HTTP/2 Binary Framing Layer
HTTP/2 replaces the text-based HTTP/1.1 format with a binary framing layer. All communication happens through frames sent over streams within a single TCP connection.
Frame structure (9-byte fixed header):
Frame structure (9-byte fixed header):
HTTP/2 Frame Format (9 bytes fixed header + variable payload)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
┌───────────────────────────────┐
│ Length (24 bits) │ payload length (0–16384 default)
├───────────────┬───────────────┤
│ Type (8 bits)│ Flags (8) │
├─┬─────────────────────────────┤
│R│ Stream ID (31 bits) │ R = reserved
└─┴─────────────────────────────┘
│ Frame Payload │
└───────────────────────────────┘
Frame Types:
DATA (0x0) → request/response body chunks
HEADERS (0x1) → compressed headers (HPACK)
PRIORITY (0x2) → stream dependency weight
RST_STREAM (0x3) → abort a stream
SETTINGS (0x4) → connection parameters
PUSH_PROMISE(0x5)→ server push announcement
PING (0x6) → keep-alive / RTT measurement
GOAWAY (0x7) → graceful connection close
WINDOW_UPDATE(0x8)→ flow control
CONTINUATION(0x9)→ continuation of HEADERS
🌊 Streams & Multiplexing
A stream is a bidirectional sequence of frames with an integer ID. Multiple streams are interleaved over one TCP connection:
- Client-initiated streams: odd IDs (1, 3, 5, …)
- Server push streams: even IDs (2, 4, 6, …)
- Stream 0: connection-level control (SETTINGS, PING)
- Max concurrent streams: negotiated via SETTINGS_MAX_CONCURRENT_STREAMS
- Frames from different streams freely interleaved → no HOL blocking between requests
📦 HPACK Header Compression
HTTP/1.1 resends all headers on every request (~500B overhead). HPACK maintains two tables:
- Static table: 61 common headers predefined (e.g.,
:method GET= index 2) - Dynamic table: headers added during session; referenced by index on repeat
:method GET takes 1 byte (index reference) instead of 12 bytes. Repeated headers across requests are nearly free.
CRIME attack: compressing secret data (cookies) alongside attacker-controlled data allows compression oracle attacks. HTTPS only — never compress sensitive headers over plaintext.
🚰 HTTP/2 Flow Control
HTTP/2 has two levels of flow control to prevent a fast sender from overwhelming a slow receiver:
- Connection-level: total bytes in flight across all streams
- Stream-level: bytes in flight per individual stream
WINDOW_UPDATE to grant more capacity after processing data.
HTTP/2 flow control is independent of TCP flow control. A receiver can throttle a single stream without blocking others — unlike HTTP/1.1 where slow reading of one response blocks the entire connection.
── Implementation 1 — HTTP/1.1 Request Parser ──
🔧 HTTP/1.1 Parser in C (State Machine)
/* http_parser.c — minimal HTTP/1.1 request parser */
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
#define MAX_HEADERS 64
#define MAX_URI_LEN 8192
#define MAX_HDR_LEN 8192
typedef struct {
char method[16];
char uri[MAX_URI_LEN];
char version[16];
struct { char name[128]; char value[MAX_HDR_LEN]; } headers[MAX_HEADERS];
int header_count;
char *body;
int body_len;
int content_length;
} http_request_t;
/* Parse request line: "METHOD URI HTTP/1.x\r\n" */
static int parse_request_line(http_request_t *req, char *line, int len) {
(void)len;
/* sscanf is acceptable for bounded inputs with fixed format */
if (sscanf(line, "%15s %8191s %15s",
req->method, req->uri, req->version) != 3)
return -1;
/* Validate method (allowlist) */
const char *valid[] = {"GET","POST","PUT","DELETE","PATCH","HEAD","OPTIONS",NULL};
for (int i = 0; valid[i]; i++)
if (strcmp(req->method, valid[i]) == 0) return 0;
return -1; /* unknown method */
}
/* Parse one header line: "Name: value\r\n" */
static int parse_header_line(http_request_t *req, char *line) {
if (req->header_count >= MAX_HEADERS) return -1;
char *colon = strchr(line, ':');
if (!colon) return -1;
*colon = '\0';
char *value = colon + 1;
while (*value == ' ') value++; /* strip leading whitespace */
int i = req->header_count++;
/* Normalize name to lowercase */
strncpy(req->headers[i].name, line, 127);
for (char *p = req->headers[i].name; *p; p++) *p = tolower(*p);
strncpy(req->headers[i].value, value, MAX_HDR_LEN - 1);
/* Track content-length for body parsing */
if (strcmp(req->headers[i].name, "content-length") == 0)
req->content_length = atoi(req->headers[i].value);
return 0;
}
/* Parse full HTTP/1.1 request from buffer */
int http_parse_request(http_request_t *req, char *buf, int len) {
memset(req, 0, sizeof(*req));
/* Find end of headers: \r\n\r\n */
char *header_end = strstr(buf, "\r\n\r\n");
if (!header_end) return -1; /* incomplete */
*header_end = '\0';
char *line = buf;
char *nl;
int first_line = 1;
while ((nl = strstr(line, "\r\n")) != NULL) {
*nl = '\0';
if (first_line) {
if (parse_request_line(req, line, nl - line) < 0) return -1;
first_line = 0;
} else if (nl > line) {
if (parse_header_line(req, line) < 0) return -1;
}
line = nl + 2;
}
/* Body follows the \r\n\r\n separator */
req->body = header_end + 4;
req->body_len = req->content_length;
return 0;
}
── Implementation 2 — epoll HTTP Server ──
🔄 Non-Blocking epoll HTTP Server with Middleware
/* epoll_server.c — minimal non-blocking HTTP/1.1 server with middleware */
#include <sys/epoll.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define MAX_EVENTS 1024
#define BUF_SIZE 65536
#define PORT 8080
typedef struct {
int fd;
char rbuf[BUF_SIZE];
int rlen;
char wbuf[BUF_SIZE];
int wlen;
int woff;
} conn_t;
static void set_nonblocking(int fd) {
int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}
/* HTTP response builder */
static int build_response(char *buf, int sz,
int status, const char *body) {
const char *reason =
status == 200 ? "OK" :
status == 201 ? "Created" :
status == 404 ? "Not Found" :
status == 405 ? "Method Not Allowed" : "Internal Server Error";
return snprintf(buf, sz,
"HTTP/1.1 %d %s\r\n"
"Content-Type: application/json\r\n"
"Content-Length: %zu\r\n"
"Connection: keep-alive\r\n"
"\r\n"
"%s",
status, reason, strlen(body), body);
}
/* Route handler — returns 0 on success, -1 on unknown route */
static int handle_request(conn_t *conn,
const char *method, const char *uri) {
if (strcmp(method, "GET") == 0 && strcmp(uri, "/") == 0) {
conn->wlen = build_response(conn->wbuf, BUF_SIZE,
200, "{\"status\":\"ok\"}");
return 0;
}
if (strncmp(uri, "/orders", 7) == 0 && strcmp(method, "GET") == 0) {
conn->wlen = build_response(conn->wbuf, BUF_SIZE,
200, "{\"orders\":[]}");
return 0;
}
conn->wlen = build_response(conn->wbuf, BUF_SIZE,
404, "{\"error\":\"NOT_FOUND\"}");
return -1;
}
int main(void) {
/* Create listen socket */
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
set_nonblocking(listen_fd);
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_port = htons(PORT),
.sin_addr.s_addr = INADDR_ANY
};
bind(listen_fd, (struct sockaddr*)&addr, sizeof(addr));
listen(listen_fd, SOMAXCONN);
/* Create epoll instance */
int epfd = epoll_create1(0);
struct epoll_event ev = { .events = EPOLLIN, .data.fd = listen_fd };
epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd, &ev);
struct epoll_event events[MAX_EVENTS];
conn_t *conns[65536] = {0}; /* indexed by fd (simplified) */
fprintf(stdout, "Server listening on :%d\n", PORT);
for (;;) {
int n = epoll_wait(epfd, events, MAX_EVENTS, -1);
for (int i = 0; i < n; i++) {
int fd = events[i].data.fd;
if (fd == listen_fd) {
/* Accept new connections */
for (;;) {
int cfd = accept(listen_fd, NULL, NULL);
if (cfd < 0) break; /* EAGAIN: no more waiting */
set_nonblocking(cfd);
conn_t *c = calloc(1, sizeof(conn_t));
c->fd = cfd;
conns[cfd] = c;
struct epoll_event cev = {
.events = EPOLLIN | EPOLLET,
.data.fd = cfd
};
epoll_ctl(epfd, EPOLL_CTL_ADD, cfd, &cev);
}
} else if (events[i].events & EPOLLIN) {
conn_t *c = conns[fd];
ssize_t nr = recv(fd, c->rbuf + c->rlen,
BUF_SIZE - c->rlen - 1, 0);
if (nr <= 0) {
epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL);
close(fd); free(c); conns[fd] = NULL;
continue;
}
c->rlen += nr;
c->rbuf[c->rlen] = '\0';
/* Check if full request received */
if (strstr(c->rbuf, "\r\n\r\n")) {
char method[16], uri[256];
sscanf(c->rbuf, "%15s %255s", method, uri);
handle_request(c, method, uri);
/* Switch to write mode */
struct epoll_event wev = {
.events = EPOLLOUT | EPOLLET,
.data.fd = fd
};
epoll_ctl(epfd, EPOLL_CTL_MOD, fd, &wev);
}
} else if (events[i].events & EPOLLOUT) {
conn_t *c = conns[fd];
ssize_t nw = send(fd, c->wbuf + c->woff,
c->wlen - c->woff, 0);
if (nw > 0) c->woff += nw;
if (c->woff >= c->wlen) {
/* Done writing — reset for next request (keep-alive) */
c->rlen = c->wlen = c->woff = 0;
struct epoll_event rev = {
.events = EPOLLIN | EPOLLET,
.data.fd = fd
};
epoll_ctl(epfd, EPOLL_CTL_MOD, fd, &rev);
}
}
}
}
}
🔬 Lab 1 — Build & Benchmark an epoll HTTP Server
1 Compile and run the epoll server from Tab 7. Test with
curl -v http://localhost:8080/ — verify headers and body.2 Benchmark with
wrk -t4 -c1000 -d30s http://localhost:8080/. Record req/sec and latency p99.3 Compare: modify server to use thread-per-request (one
pthread_create per accept). Re-benchmark at c=1000. Compare req/sec, memory usage (valgrind --tool=massif).4 Add a keep-alive test:
wrk --connections 100 --threads 4 --duration 30s --pipeline 10. Observe connection reuse in server logs.🔬 Lab 2 — HTTP Parser Fuzzing
1 Compile the parser from Tab 7 with AddressSanitizer:
gcc -fsanitize=address,undefined -g http_parser.c -o parser_test2 Write a test harness that feeds malformed inputs: missing CRLF, header without colon, zero Content-Length with body, negative Content-Length. Verify no crashes or buffer overflows.
3 Test HTTP request smuggling input: body with both Content-Length and Transfer-Encoding chunked. Verify your parser handles it per RFC (chunked wins).
4 Bonus: use libFuzzer:
clang -fsanitize=fuzzer,address -o fuzz_parser fuzz_parser.c http_parser.c. Run for 60 seconds and inspect corpus.🔬 Lab 3 — HTTP Headers & Content Negotiation
1 Add content negotiation to your server: if
Accept: application/xml is requested, return XML; if Accept: application/json, return JSON. For unsupported types, return 406.2 Implement ETag caching: generate a simple ETag (e.g., SHA-1 of response body). On
If-None-Match match, return 304 Not Modified with empty body.3 Add gzip compression: if
Accept-Encoding: gzip present, compress response body with zlib. Add Content-Encoding: gzip header. Verify with curl --compressed.4 Implement CORS middleware: add
Access-Control-Allow-Origin and handle OPTIONS preflight. Test with a browser fetch() from a different origin.── Phase 0 Batch 2 Checklist ──
- Parse an HTTP/1.1 request at the byte level (request line, headers, body)
- Explain chunked transfer encoding and when it's used
- Describe HTTP request smuggling (CL.TE) and prevention
- Implement a non-blocking epoll server with keep-alive
- Describe HTTP/2 frame format and the 9 frame types
- Explain HPACK compression and the static/dynamic table
- Implement a middleware pipeline with short-circuit semantics
- Explain CORS preflight and which headers are required