SQL is the Most Underestimated Backend Skill
POSTGRESQL FOCUSMost backend developers know SQL syntax. Very few understand what happens inside the database when a query runs. That gap is where 90% of production performance problems hide.
This module covers how PostgreSQL executes queries, how indexes work at the data-structure level, how to read execution plans, and what the ACID properties and isolation levels actually guarantee — with enough depth to diagnose real production incidents.
SQL Query Execution Pipeline
- 1Parse — SQL text is lexed and parsed into an Abstract Syntax Tree (AST). Syntax errors are caught here. Output: parse tree.
- 2Rewrite — Rule system rewrites the parse tree. Views are expanded inline.
ON INSERT DO INSTEADrules are applied. Output: rewritten parse tree. - 3Plan / Optimise — The query planner generates candidate execution plans, estimates cost for each using table statistics (
pg_statistic), and picks the cheapest plan. This is where index decisions are made. Output: plan tree. - 4Execute — The executor walks the plan tree, pulling rows from leaf nodes (seq scan, index scan) up through joins, aggregates, and sorts. Output: result rows.
📚 Analogy: The planner is like a GPS. It doesn't know the actual traffic (data) — it estimates travel time based on historical averages (statistics). Outdated statistics = bad route choice = slow query. ANALYZE updates the statistics. VACUUM ANALYZE also reclaims dead tuple space.
Key PostgreSQL Concepts: Quick Reference
| Concept | What it is | Why it matters |
|---|---|---|
MVCC | Multi-Version Concurrency Control — each transaction sees a snapshot; old row versions kept until vacuumed | Readers never block writers; enables repeatable reads without locks |
pg_statistic | Per-column statistics: most-common values, histogram, null fraction, distinct count | Planner cost estimates depend on this; stale stats → bad plans |
VACUUM | Reclaims space from dead tuples (rows deleted/updated but still stored for MVCC visibility) | Without autovacuum, tables bloat; XID wraparound causes downtime |
TOAST | The Oversized-Attribute Storage Technique — large values (text, bytea, jsonb) stored out-of-line | Transparent; but SELECT * on TOAST columns adds I/O you may not expect |
WAL | Write-Ahead Log — changes written to WAL before data pages; enables crash recovery and replication | Understanding WAL is essential for replication, point-in-time recovery |
What an Index Actually Is
DATA STRUCTUREAn index is a separate data structure that stores a subset of column values alongside physical pointers (tuple IDs / ctids) to the heap rows. The database maintains it automatically on insert/update/delete. Every index you add speeds up reads but slows writes and consumes disk + RAM.
PostgreSQL index types: B-tree (default), Hash, GiST, SP-GiST, GIN, BRIN. For backend engineering, B-tree covers 95% of use cases.
B-tree Internals
Root page ┌─────────────────────────────────────────────────────┐ │ [50] [100] [200] │ │ ↓ ↓ ↓ ↓ │ └─────────────────────────────────────────────────────┘ │ │ │ │ Inner Inner Inner Inner ← branch pages (pointers only) pages pages pages pages │ ┌─────────────────────────────────────────────────────┐ │ Leaf page (doubly linked list ←→) │ │ [42, ctid(0,5)] [43, ctid(2,1)] [44, ctid(1,8)] │ │ → heap page 0, tuple 5 │ └─────────────────────────────────────────────────────┘
- 1Root → inner pages → leaf pages: B-tree is balanced — every leaf is at the same depth. Lookup is O(log N). For 1 million rows with 100 entries per page, depth ≈ 3. That's 3 page reads for any lookup.
- 2Leaf pages are doubly linked — range scans (
WHERE id BETWEEN 100 AND 200) walk the linked list after finding the start, without re-traversing from root. - 3Each leaf entry stores: key value + ctid (physical location: page number, tuple offset in page). The executor fetches the heap page using the ctid.
- 4Index-only scan: If all needed columns are in the index (covering index), the heap page fetch is skipped entirely. Check:
EXPLAINshows "Index Only Scan".
Index Types Reference
| Type | Operators supported | Best for | Notes |
|---|---|---|---|
| B-tree | =, <, <=, >, >=, BETWEEN, LIKE 'foo%' | Most use cases; equality and range queries | Default. Supports ORDER BY without sort node. |
| Hash | = only | Pure equality on large text keys | Faster than B-tree for equality only. No range. Not WAL-logged before PG10 (avoid pre-10). |
| GIN | @>, <@, &&, @@ | JSONB containment, full-text search, array overlap | Inverted index. Fast reads, slow updates. Use gin_pending_list_limit. |
| GiST | Geometric, range types, full-text | PostGIS (geography), range overlaps, nearest-neighbour | Extensible. Slower build than B-tree. |
| BRIN | Range queries on physically ordered data | Time-series, append-only tables (created_at) | Very small index (stores min/max per block range). Useless if data not correlated with physical order. |
Composite Indexes: Column Order Matters
The Left-Prefix Rule
CRITICALA composite index (a, b, c) can satisfy queries that filter on: a, (a, b), or (a, b, c). It cannot be used for queries that only filter on b or c alone, because the B-tree is sorted by a first.
-- Index: (user_id, status, created_at) -- ✅ Uses index (full prefix) SELECT * FROM orders WHERE user_id = 42 AND status = 'pending' ORDER BY created_at DESC; -- ✅ Uses index (partial prefix, range on last) SELECT * FROM orders WHERE user_id = 42 AND created_at > '2026-01-01'; -- ❌ Cannot use this index (no leading column) SELECT * FROM orders WHERE status = 'pending'; -- needs separate index on (status)
Rule of thumb: Put the most selective column first (highest cardinality — most distinct values). Equality columns before range columns. The range column should be last — once you hit a range condition, the remaining columns in the index cannot be used for filtering.
Partial Indexes
Index Only the Rows You Query
PERFORMANCEA partial index includes only rows matching a WHERE clause. Smaller, faster to maintain, and can enforce constraints on subsets of data.
-- Index only unprocessed jobs (99% of queries target this subset) CREATE INDEX idx_jobs_pending ON jobs(created_at) WHERE status = 'pending'; -- Partial unique constraint: only one active record per user CREATE UNIQUE INDEX idx_subscriptions_active_user ON subscriptions(user_id) WHERE cancelled_at IS NULL; -- Query MUST include the partial index predicate to use it SELECT * FROM jobs WHERE status = 'pending' AND created_at < now() - interval '5 minutes';
Expression Indexes
Indexing the Result of a Function
FUNCTIONAL INDEXESWhen queries apply a function to a column in WHERE, a plain column index is useless — the function result isn't in the index. Expression indexes solve this.
-- ❌ Index on email NOT used — function applied to column SELECT * FROM users WHERE lower(email) = 'alice@example.com'; -- ✅ Create expression index to match the query CREATE INDEX idx_users_email_lower ON users(lower(email)); -- Now the query above uses the index. -- The index stores lower(email) values, not raw email values. -- Other common examples: CREATE INDEX ON events(date_trunc('day', created_at)); CREATE INDEX ON products((metadata->> 'sku')); -- JSONB field extraction
The Real Cost of Indexes
| Operation | Impact |
|---|---|
INSERT | Every index on the table gets an entry inserted. 5 indexes = 5 B-tree insertions + potential page splits. |
UPDATE | If the indexed column changes: old entry deleted + new entry inserted in every affected index. Heap-only tuple (HOT) update avoids this if the page has space and no indexed column changes. |
DELETE | Index entries marked dead; not removed until VACUUM runs. |
| Disk space | A B-tree index on a 100M-row integer column ≈ 2–3 GB. JSONB GIN indexes can be 2–5× the table size. |
| Cache pollution | Indexes compete with table data for shared_buffers. Unused indexes evict useful data. |
⚠️ Find and drop unused indexes: SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND schemaname = 'public'; — Zero scans since last stats reset means the index is dead weight. Drop it.
Reading Query Plans
EXPLAIN ANALYZEEXPLAIN shows the plan the planner chose with estimated costs. EXPLAIN ANALYZE executes the query and shows actual runtime. Always use EXPLAIN (ANALYZE, BUFFERS) in production diagnosis — BUFFERS reveals cache hits vs disk reads.
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.name, count(o.id) AS total_orders FROM users u JOIN orders o ON o.user_id = u.id WHERE u.status = 'active' GROUP BY u.name ORDER BY total_orders DESC LIMIT 10;
Anatomy of an EXPLAIN Output
Limit (cost=1240.50..1240.53 rows=10 width=40) (actual time=18.432..18.435 rows=10 loops=1) -> Sort (cost=1240.50..1242.00 rows=600 width=40) (actual time=18.428..18.430 rows=10 loops=1) Sort Key: (count(o.id)) DESC Sort Method: top-N heapsort Memory: 25kB -> HashAggregate (cost=1200.00..1215.00 rows=600 width=40) (actual time=17.900..18.100 rows=621 loops=1) Group Key: u.name -> Hash Join (cost=85.00..1125.00 rows=15000 width=16) (actual time=1.200..14.800 rows=14832 loops=1) Hash Cond: (o.user_id = u.id) -> Seq Scan on orders o (cost=0.00..890.00 rows=50000 width=8) (actual time=0.050..6.400 rows=50000 loops=1) -> Hash (cost=75.00..75.00 rows=800 width=8) (actual time=0.900..0.900 rows=823 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 42kB -> Index Scan on users u (cost=0.29..75.00 rows=800 width=8) (actual time=0.060..0.800 rows=823 loops=1) Index Cond: (status = 'active') Planning Time: 0.842 ms Execution Time: 18.521 ms
How to Read It
| Field | Meaning | What to watch for |
|---|---|---|
cost=X..Y | X = startup cost (first row), Y = total cost (arbitrary planner units). Lower is better. Used for plan comparison only — not ms. | Very high total cost vs actual time can indicate stale statistics. |
rows=N | Estimated (in cost line) vs actual (in actual time line) row count. | Large estimate vs actual mismatch → run ANALYZE. Bad estimates cascade into bad join/sort choices. |
loops=N | How many times this node executed. Nested loops execute inner node once per outer row. | High loops × high rows = N+1 problem at the SQL level. |
Seq Scan | Full table scan — reads every page. | Expected on small tables or when fetching most rows. Bad on large tables with selective WHERE clause. |
Index Scan | B-tree traversal + heap fetches for each found row. | Good. If many rows fetched: consider Bitmap Index Scan which batches heap fetches. |
Bitmap Heap Scan | Builds bitmap of matching heap pages, then fetches them in order (reduces random I/O). | Usually good for medium selectivity (5–20% of rows). |
Hash Join | Build hash table from smaller relation, probe with larger. | Good for large joins. Needs work_mem; if batches > 1 it spills to disk. |
Nested Loop | For each outer row, scan inner. O(N×M) worst case. | Good when outer is small + inner has an index. Bad on large tables without index. |
Warning Signs in Plans
Red Flags
- Seq Scan on large table with low actual rows → missing index
- Estimated rows ≫ actual rows → stale stats, run
ANALYZE - Nested Loop with loops > 1000 → N+1 pattern in SQL
- Sort with external merge →
work_memtoo low for sort - Hash Batches > 1 → hash join spilling to disk
- Filter rows ≫ 0 after Index Scan → over-fetching, index not selective enough
Good Signs
- Index Only Scan → all columns in index, heap not touched
- Bitmap Index Scan + Bitmap Heap Scan → efficient bulk access
- Hash Join with Batches=1 → fits in memory
- Estimated ≈ actual rows → planner has accurate stats
- Planning time < 1ms → query is not overly complex
- Execution time consistent → data fits in
shared_buffers(Buffers: hit)
Use explain.dalibo.com to paste EXPLAIN (FORMAT JSON) output and get a visual, colour-coded plan tree. Far easier to navigate than text format for complex queries.
The N+1 Query Problem
N+1: The Silent Performance Killer
ANTI-PATTERNN+1 occurs when you fetch a list of N items then issue a separate query for each item. The result is N+1 round trips to the database instead of 1 or 2.
-- ❌ N+1: fetch 100 users, then 1 query per user for their posts users = SELECT * FROM users LIMIT 100; -- 1 query FOR u IN users: posts = SELECT * FROM posts WHERE user_id = u.id; -- 100 queries -- Total: 101 queries. On 50ms RTT: ~5 seconds. -- ✅ Solution 1: JOIN SELECT u.*, p.title, p.created_at FROM users u LEFT JOIN posts p ON p.user_id = u.id LIMIT 100; -- 1 query -- ✅ Solution 2: Batch load (IN clause) user_ids = [1, 2, 3, ..., 100]; SELECT * FROM posts WHERE user_id = ANY($1); -- 2 total queries
JOIN Types
| JOIN Type | Returns | Use when |
|---|---|---|
INNER JOIN | Rows with matching keys in both tables | Both sides must exist. Most common. |
LEFT JOIN | All rows from left + matching from right (NULLs where no match) | Optional relationship: user may or may not have a profile. |
RIGHT JOIN | All rows from right + matching from left | Rare; rewrite as LEFT JOIN with tables swapped. |
FULL OUTER JOIN | All rows from both, NULLs where no match | Reconciliation queries, finding unmatched rows on either side. |
CROSS JOIN | Cartesian product (M × N rows) | Generating test data, pairing every item with every other. Dangerous on large tables. |
LATERAL JOIN | Subquery that can reference outer query's columns | "For each user, get their 3 most recent orders" — correlated subquery without N+1. |
CTEs vs Subqueries
Common Table Expressions
WITH CLAUSE-- Subquery (inline, can be optimised away by planner) SELECT u.name, o.total FROM users u JOIN (SELECT user_id, sum(amount) AS total FROM orders GROUP BY user_id) o ON o.user_id = u.id; -- CTE (WITH) — same query, more readable WITH order_totals AS ( SELECT user_id, sum(amount) AS total FROM orders GROUP BY user_id ) SELECT u.name, ot.total FROM users u JOIN order_totals ot ON ot.user_id = u.id; -- Recursive CTE: walk a tree/graph (org hierarchy, comments thread) WITH RECURSIVE org_tree AS ( SELECT id, name, manager_id, 0 AS depth FROM employees WHERE manager_id IS NULL -- anchor UNION ALL SELECT e.id, e.name, e.manager_id, ot.depth + 1 FROM employees e JOIN org_tree ot ON e.manager_id = ot.id -- recursive ) SELECT * FROM org_tree ORDER BY depth;
PostgreSQL ≥ 12: CTEs are inlined by default (treated as subqueries, planner can optimise through them). Use WITH ... AS MATERIALIZED to force the old behaviour — the CTE is executed once and cached, which can be faster when referenced multiple times.
Window Functions
Aggregates Without Collapsing Rows
OVER CLAUSE-- Rank users by order count within each country SELECT name, country, order_count, rank() OVER ( PARTITION BY country ORDER BY order_count DESC ) AS country_rank FROM users; -- Running total (cumulative sum) SELECT created_at::date, revenue, sum(revenue) OVER (ORDER BY created_at::date) AS running_total FROM daily_sales; -- Row number for cursor pagination (stable ordering) SELECT *, row_number() OVER (ORDER BY id) AS rn FROM events WHERE id > :last_cursor LIMIT 50; -- Lag/Lead: compare row with previous/next SELECT date, price, lag(price, 1) OVER (ORDER BY date) AS prev_price, price - lag(price, 1) OVER (ORDER BY date) AS change FROM stock_prices;
UPSERT and Conflict Handling
-- Insert or update on conflict (upsert) INSERT INTO user_stats (user_id, login_count, last_seen) VALUES ($1, 1, now()) ON CONFLICT (user_id) DO UPDATE SET login_count = user_stats.login_count + 1, last_seen = EXCLUDED.last_seen; -- EXCLUDED = the rejected row -- Insert and ignore duplicates INSERT INTO events (id, payload) VALUES ($1, $2) ON CONFLICT (id) DO NOTHING; -- Returning the affected rows (useful for getting auto-generated IDs) INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id, created_at;
Normalization: Eliminating Redundancy
DATA MODELLINGNormalization is the process of structuring tables to minimise data redundancy and prevent update anomalies. Each normal form adds a constraint. In practice, design to 3NF/BCNF and deliberately denormalize specific hot paths with explicit justification.
Normal Forms
First Normal Form — Atomic Values
Rule: Each column holds a single, indivisible value. No repeating groups (multiple phone numbers in one column). Each row is uniquely identifiable (primary key exists).
Violation: users(id, name, phones="555-1234,555-5678") — comma-delimited phones in one column.
Fix: Separate user_phones(user_id, phone_number) table with one row per phone.
Second Normal Form — No Partial Dependencies
Rule: Must be 1NF. Every non-key column must depend on the whole primary key, not just part of it. Only relevant for tables with composite primary keys.
Violation: order_items(order_id, product_id, quantity, product_name) — product_name depends only on product_id, not the full (order_id, product_id) key.
Fix: Move product_name to the products table. order_items keeps only quantity.
Third Normal Form — No Transitive Dependencies
Rule: Must be 2NF. No non-key column depends on another non-key column (transitive dependency).
Violation: employees(id, dept_id, dept_name, salary) — dept_name depends on dept_id, not directly on id.
Fix: Extract departments(dept_id, dept_name). employees keeps only dept_id as a foreign key.
Boyce-Codd Normal Form — Every Determinant Is a Candidate Key
Rule: Stricter than 3NF. For every functional dependency X → Y, X must be a superkey (uniquely identifies rows). Fixes edge cases in 3NF with overlapping composite candidate keys.
Violation: course_teachers(student, course, teacher) where a teacher teaches only one course but a course can have multiple teachers. The dependency teacher → course exists, but teacher is not a key.
Fix: Decompose into teacher_courses(teacher, course) and student_courses(student, course).
When to Denormalize
Deliberate Denormalization
TRADE-OFFNormalization optimises for write integrity. Denormalization trades integrity for read performance. Denormalize only when you have a measured performance problem and accept the consistency maintenance burden.
| Pattern | What it does | Maintenance cost |
|---|---|---|
| Cached count | Store posts.comment_count alongside the post; increment/decrement via trigger or application logic | Must update on every insert/delete to comments |
| Materialized view | Pre-computed JOIN result stored as a table (CREATE MATERIALIZED VIEW); refreshed on schedule or trigger | REFRESH MATERIALIZED VIEW CONCURRENTLY — doesn't block reads |
| Duplicated columns | Copy user.name into orders.user_name to avoid JOIN on every order list | Must sync on user name change (rare) |
| JSONB for variable attributes | Store flexible/sparse attributes in a JSONB column instead of an EAV table | No schema enforcement; use GIN index for querying |
ACID: Four Guarantees
CORRECTNESSACID is the contract a database makes about what happens to your data, even in the face of crashes, errors, and concurrent access. Understanding each property tells you what you can and cannot rely on.
- AAtomicity — A transaction is all-or-nothing. If any statement within a transaction fails, all changes are rolled back. There is no partial transaction. Implemented via the WAL: on crash, incomplete transactions are rolled back during recovery using the WAL log.
- CConsistency — The database moves from one valid state to another. Constraints (FK, NOT NULL, CHECK, UNIQUE) are enforced at commit time. The application is responsible for domain-level consistency (business rules).
- IIsolation — Concurrent transactions behave as if they ran serially. The degree of isolation is configurable (see isolation levels below). Full isolation is expensive; databases offer weaker but faster levels.
- DDurability — Committed transactions survive crashes. Achieved by flushing WAL to disk before acknowledging the commit.
synchronous_commit = offtrades durability for latency (risk: last few milliseconds of commits lost on crash).
Read Phenomena (What Can Go Wrong)
| Phenomenon | Description | Example |
|---|---|---|
| Dirty Read | Read uncommitted data from another transaction that may roll back | T1 reads a balance modified by T2 before T2 commits. T2 rolls back — T1 read phantom data. |
| Non-repeatable Read | Re-reading the same row within a transaction returns different data | T1 reads price=10. T2 updates price=20 and commits. T1 re-reads: price=20. Same query, different result. |
| Phantom Read | Re-executing a range query returns different rows | T1 counts orders WHERE status='pending': 5 rows. T2 inserts a new pending order. T1 re-counts: 6 rows. |
| Serialisation Anomaly | Result is inconsistent with any serial order of the transactions | Write skew: two doctors both read "at least 1 on call", both go off-call — result: zero doctors on call. |
Isolation Levels
| Level | Dirty Read | Non-Repeatable | Phantom | Serialisation Anomaly | PostgreSQL default? |
|---|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | Possible | No (PG treats as Read Committed) |
| Read Committed | Not possible | Possible | Possible | Possible | ✅ Default |
| Repeatable Read | Not possible | Not possible | Not possible* | Possible | No |
| Serializable | Not possible | Not possible | Not possible | Not possible | No (highest cost) |
*PostgreSQL's Repeatable Read uses snapshot isolation which prevents phantoms too — stronger than the SQL standard requires. PG's Serializable uses Serializable Snapshot Isolation (SSI), a lock-free algorithm that detects and aborts transactions that would create anomalies.
</p>Deadlocks
Deadlock Detection and Prevention
CONCURRENCYA deadlock occurs when two (or more) transactions each hold a lock the other needs, creating a circular wait.
-- T1 T2 BEGIN; BEGIN; UPDATE accounts SET bal=bal-100 UPDATE accounts SET bal=bal+100 WHERE id=1; -- locks row 1 WHERE id=2; -- locks row 2 UPDATE accounts SET bal=bal+100 UPDATE accounts SET bal=bal-100 WHERE id=2; -- waits for row 2 WHERE id=1; -- waits for row 1 ← DEADLOCK
PostgreSQL detects deadlocks automatically (within deadlock_timeout, default 1s) and aborts one transaction with: ERROR: deadlock detected. The other continues normally.
Prevention Strategies
- Consistent lock ordering: always lock resources in the same order (e.g., always lock lower ID first). Eliminates the circular wait condition.
SELECT FOR UPDATE SKIP LOCKED: skip rows locked by other transactions instead of waiting. Useful for job queues.NOWAIT: fail immediately if lock cannot be acquired:SELECT ... FOR UPDATE NOWAIT→ returns error, never waits.- Short transactions: the shorter a transaction, the shorter the lock hold time, the lower the probability of deadlock.
Advisory Locks
Application-Level Mutex via PostgreSQL
DISTRIBUTED LOCKINGAdvisory locks are arbitrary integer locks managed by the application — PostgreSQL doesn't tie them to any table row. Useful for distributed mutual exclusion (e.g., ensuring only one instance runs a cron job).
-- Acquire exclusive advisory lock (blocks if held by another session) SELECT pg_advisory_lock(12345); -- Non-blocking try: returns true/false SELECT pg_try_advisory_lock(12345); -- Transaction-scoped (auto-released on commit/rollback) SELECT pg_advisory_xact_lock(12345); -- Release session-scoped lock explicitly SELECT pg_advisory_unlock(12345);
Connecting to PostgreSQL with libpq
#include <libpq-fe.h> #include <stdio.h> #include <stdlib.h> /* Compile: gcc db.c -lpq -o db Requires: libpq-dev package (apt install libpq-dev) */ PGconn *db_connect(const char *connstr) { PGconn *conn = PQconnectdb(connstr); if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, "Connection failed: %s\n", PQerrorMessage(conn)); PQfinish(conn); return NULL; } return conn; } int main(void) { /* Connection string: can also use env vars PGHOST, PGPORT, PGUSER etc */ PGconn *conn = db_connect( "host=localhost port=5432 dbname=myapp user=myuser password=secret" ); if (!conn) return 1; printf("Connected to PostgreSQL %s\n", PQparameterStatus(conn, "server_version")); PQfinish(conn); return 0; }
Prepared Statements (Parameterised Queries)
/* Always use prepared statements — never string-concatenate user input */ int get_user_by_id(PGconn *conn, int user_id) { /* Prepare once, execute many times */ PGresult *prep = PQprepare(conn, "get_user", /* statement name */ "SELECT id, name, email FROM users WHERE id = $1", 1, /* number of params */ NULL /* param types (NULL = infer) */ ); if (PQresultStatus(prep) != PGRES_COMMAND_OK) { fprintf(stderr, "Prepare failed: %s\n", PQresultErrorMessage(prep)); PQclear(prep); return -1; } PQclear(prep); /* Execute: pass parameter as string (libpq converts) */ char id_str[16]; snprintf(id_str, sizeof(id_str), "%d", user_id); const char *params[] = { id_str }; PGresult *res = PQexecPrepared(conn, "get_user", /* statement name */ 1, /* nParams */ params, /* paramValues */ NULL, /* paramLengths (NULL = text mode, use strlen) */ NULL, /* paramFormats (NULL = all text) */ 0 /* resultFormat: 0=text, 1=binary */ ); if (PQresultStatus(res) != PGRES_TUPLES_OK) { fprintf(stderr, "Query failed: %s\n", PQresultErrorMessage(res)); PQclear(res); return -1; } int rows = PQntuples(res); int cols = PQnfields(res); printf("Got %d rows, %d cols\n", rows, cols); for (int r = 0; r < rows; r++) { for (int c = 0; c < cols; c++) { printf(" %s = %s\n", PQfname(res, c), PQgetisnull(res, r, c) ? "(null)" : PQgetvalue(res, r, c)); } } PQclear(res); return rows; }
Transactions in C
int transfer_funds(PGconn *conn, int from_id, int to_id, int amount_cents) { /* BEGIN */ PGresult *r = PQexec(conn, "BEGIN"); if (PQresultStatus(r) != PGRES_COMMAND_OK) { PQclear(r); return -1; } PQclear(r); /* Debit — lock the row first with FOR UPDATE */ char from_str[16], to_str[16], amt_str[16]; snprintf(from_str, sizeof(from_str), "%d", from_id); snprintf(to_str, sizeof(to_str), "%d", to_id); snprintf(amt_str, sizeof(amt_str), "%d", amount_cents); const char *debit_params[] = { amt_str, from_str }; r = PQexecParams(conn, "UPDATE accounts SET balance = balance - $1 WHERE id = $2 AND balance >= $1", 2, NULL, debit_params, NULL, NULL, 0); if (PQresultStatus(r) != PGRES_COMMAND_OK || strcmp(PQcmdTuples(r), "1") != 0) { /* 0 rows = insufficient balance */ PQclear(r); PQexec(conn, "ROLLBACK"); return -1; } PQclear(r); /* Credit */ const char *credit_params[] = { amt_str, to_str }; r = PQexecParams(conn, "UPDATE accounts SET balance = balance + $1 WHERE id = $2", 2, NULL, credit_params, NULL, NULL, 0); if (PQresultStatus(r) != PGRES_COMMAND_OK) { PQclear(r); PQexec(conn, "ROLLBACK"); return -1; } PQclear(r); /* COMMIT */ r = PQexec(conn, "COMMIT"); int ok = PQresultStatus(r) == PGRES_COMMAND_OK; PQclear(r); return ok ? 0 : -1; }
Minimal Connection Pool in C
/* Production uses PgBouncer. This illustrates the concept. */ #define POOL_SIZE 8 typedef struct { PGconn *conn; int in_use; pthread_mutex_t lock; } conn_slot_t; static conn_slot_t pool[POOL_SIZE]; static pthread_mutex_t pool_lock = PTHREAD_MUTEX_INITIALIZER; void pool_init(const char *connstr) { for (int i = 0; i < POOL_SIZE; i++) { pool[i].conn = PQconnectdb(connstr); pool[i].in_use = 0; pthread_mutex_init(&pool[i].lock, NULL); } } PGconn *pool_acquire(void) { pthread_mutex_lock(&pool_lock); for (int i = 0; i < POOL_SIZE; i++) { if (!pool[i].in_use && PQstatus(pool[i].conn) == CONNECTION_OK) { pool[i].in_use = 1; pthread_mutex_unlock(&pool_lock); return pool[i].conn; } } pthread_mutex_unlock(&pool_lock); return NULL; /* pool exhausted */ } void pool_release(PGconn *conn) { pthread_mutex_lock(&pool_lock); for (int i = 0; i < POOL_SIZE; i++) { if (pool[i].conn == conn) { pool[i].in_use = 0; break; } } pthread_mutex_unlock(&pool_lock); }
In production: use PgBouncer (transaction-mode pooling) in front of PostgreSQL. It handles thousands of client connections multiplexed over a small number of server connections. Never create a new PGconn per request in a high-throughput server.
🔬 Lab 1 — Index Experiments on Real Data
TOOLS: PostgreSQL · psql · pgbenchGoal: Generate a large dataset and observe the before/after performance impact of indexes using EXPLAIN ANALYZE.
createdb perflabpsql perflab -c "CREATE TABLE orders (id serial PRIMARY KEY, user_id int, status text, amount int, created_at timestamptz DEFAULT now());"psql perflab -c "INSERT INTO orders (user_id, status, amount) SELECT (random()*10000)::int, (ARRAY['pending','complete','cancelled'])[ceil(random()*3)::int], (random()*10000)::int FROM generate_series(1,1000000);"EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';Note: execution time, "Seq Scan", rows removed by filter.
CREATE INDEX idx_orders_user_status ON orders(user_id, status);Re-run the EXPLAIN. Note: execution time, node type change (Index Scan or Bitmap Index Scan), rows fetched.
CREATE INDEX idx_orders_status_user ON orders(status, user_id);. Run WHERE status = 'pending' AND user_id = 42. Which index does the planner choose? Why?CREATE INDEX idx_orders_pending ON orders(user_id) WHERE status = 'pending';. Compare plan for pending-only query vs a query across all statuses.SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE schemaname='public' ORDER BY idx_scan;. Drop any with 0 scans.🔬 Lab 2 — Isolation Levels in Action
TOOLS: PostgreSQL · two psql sessionsGoal: Observe the difference between Read Committed and Repeatable Read isolation levels by running concurrent transactions in two terminal sessions.
CREATE TABLE balances (id int PRIMARY KEY, amount int); INSERT INTO balances VALUES (1, 1000), (2, 500);Session A:
BEGIN; SELECT amount FROM balances WHERE id=1; → 1000Session B:
UPDATE balances SET amount=2000 WHERE id=1; COMMIT;Session A:
SELECT amount FROM balances WHERE id=1; → 2000 (changed!)Session A:
ROLLBACK;Session A:
BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT amount FROM balances WHERE id=1; → 1000Session B:
UPDATE balances SET amount=3000 WHERE id=1; COMMIT;Session A:
SELECT amount FROM balances WHERE id=1; → still 1000 (snapshot!)Session A:
COMMIT;Session A:
BEGIN; UPDATE balances SET amount=amount-100 WHERE id=1;Session B:
BEGIN; UPDATE balances SET amount=amount-100 WHERE id=2;Session A:
UPDATE balances SET amount=amount+100 WHERE id=2; → waitsSession B:
UPDATE balances SET amount=amount+100 WHERE id=1; → deadlock! One session gets: ERROR: deadlock detected🔬 Lab 3 — libpq CRUD Application in C
TOOLS: gcc · libpq-dev · PostgreSQLGoal: Build a command-line user management tool using libpq with prepared statements and transactions.
CREATE TABLE users (id serial PRIMARY KEY, name text NOT NULL, email text UNIQUE NOT NULL, created_at timestamptz DEFAULT now());users.c using the libpq code from the Implementation tab. Implement: user_create(conn, name, email) → returns new id, user_find(conn, id) → prints user, user_list(conn) → prints all users.PQprepare for all queries. Verify that SQL injection is not possible: try passing name = "'; DROP TABLE users; --" as the name parameter. The prepared statement should store it literally.user_transfer_email(conn, from_id, to_id) which copies the email from one user to another in a single transaction, verifying both users exist. Use BEGIN / COMMIT / ROLLBACK pattern.PQresultStatus on every result. Handle PGRES_FATAL_ERROR (constraint violations like duplicate email) gracefully — print the error message, don't crash.pthread_create. Verify correctness under concurrent inserts.🔬 Lab 4 — Normalization Design Exercise
TOOLS: pen & paper · psqlGoal: Take a denormalized flat table and normalize it to 3NF, then measure query performance before and after.
CREATE TABLE flat_orders (order_id int, order_date date, customer_id int, customer_name text, customer_email text, customer_city text, product_id int, product_name text, product_category text, quantity int, unit_price numeric);order_id? On customer_id? On product_id? Map the violations of 2NF and 3NF.customers, products, orders, order_items. Write the CREATE TABLE statements with appropriate primary keys and foreign keys.SELECT * FROM flat_orders. Compare execution plans and times.total_amount column to orders (computed as SUM of quantity × unit_price). Update it via a trigger. Measure query time for "get orders with total > 1000" with vs without the denormalized column.Module Mastery Checklist
M06 COMPLETEYou have mastered this module when you can check off every item below without referring to notes.
SQL Internals
- Describe the 4 stages of SQL query execution in PostgreSQL (parse → rewrite → plan → execute)
- Explain what
pg_statisticcontains and why stale statistics cause bad query plans - Explain MVCC: why readers don't block writers, and what VACUUM does
Indexes
- Describe the B-tree internal structure: root, inner pages, leaf pages, leaf page linked list, ctid pointers
- Explain why a B-tree lookup is O(log N) and state the typical depth for 1M rows
- Apply the left-prefix rule: given an index
(a, b, c), state which WHERE clauses can use it - Explain why an equality column should come before a range column in a composite index
- Create a partial index and a functional (expression) index; state when each is appropriate
- State the write overhead of indexes and how to find unused indexes using
pg_stat_user_indexes - Distinguish B-tree, Hash, GIN, and BRIN — name the use case each is best suited for
EXPLAIN
- Interpret an EXPLAIN output: identify seq scan vs index scan vs bitmap scan, explain cost fields, identify row count mismatch
- State 4 red flags in a query plan (seq scan on large table, estimate vs actual mismatch, nested loop with high loops, sort spill)
- Explain what
EXPLAIN (ANALYZE, BUFFERS)adds over plainEXPLAIN
Query Patterns
- Define the N+1 problem and write two solutions (JOIN and batch IN clause)
- State the difference between INNER, LEFT, FULL OUTER, and LATERAL JOIN
- Write a recursive CTE to traverse a parent-child hierarchy
- Use a window function to rank rows within a partition and compute a running total
- Write an UPSERT using
ON CONFLICT DO UPDATE
Normalization
- State the rules for 1NF, 2NF, 3NF, and BCNF; identify which normal form a given table violates
- Normalize a denormalized flat table to 3NF, producing correct FK relationships
- Name 3 deliberate denormalization patterns and the maintenance cost of each
ACID & Isolation
- Define Atomicity, Consistency, Isolation, Durability — and state which mechanism implements each in PostgreSQL
- Define dirty read, non-repeatable read, phantom read, and serialisation anomaly
- State which read phenomena each isolation level prevents; name PostgreSQL's default isolation level
- Describe a deadlock scenario, how PostgreSQL detects it, and two prevention strategies
- Write a fund transfer in C using libpq with correct BEGIN/COMMIT/ROLLBACK error handling
- Explain advisory locks and give one production use case
C / libpq
- Write a libpq connection, prepared statement, and parameterised execution in C
- Handle
PQresultStatuscorrectly for all result types (TUPLES_OK, COMMAND_OK, FATAL_ERROR) - Explain why PgBouncer is needed in production rather than creating a new PGconn per request
Next in Phase 2: M07 covers Transactions & MVCC in depth (savepoints, advisory locks, SKIP LOCKED job queues) and M08 covers NoSQL & Redis (when to reach for a non-relational store and how to use it alongside PostgreSQL).