M06 — SQL & Indexing

Backend Engineering · Phase 2 · Module 6

SQL & Indexing

The difference between a query that takes 2 ms and one that takes 2 minutes is almost always an index — or the lack of one.

B-tree Internals EXPLAIN ANALYZE Index Types Normalization ACID Isolation Levels libpq / C

🗄️

SQL is the Most Underestimated Backend Skill

POSTGRESQL FOCUS

Most backend developers know SQL syntax. Very few understand what happens inside the database when a query runs. That gap is where 90% of production performance problems hide.

This module covers how PostgreSQL executes queries, how indexes work at the data-structure level, how to read execution plans, and what the ACID properties and isolation levels actually guarantee — with enough depth to diagnose real production incidents.

SQL Query Execution Pipeline

1Parse — SQL text is lexed and parsed into an Abstract Syntax Tree (AST). Syntax errors are caught here. Output: parse tree.
2Rewrite — Rule system rewrites the parse tree. Views are expanded inline. ON INSERT DO INSTEAD rules are applied. Output: rewritten parse tree.
3Plan / Optimise — The query planner generates candidate execution plans, estimates cost for each using table statistics (pg_statistic), and picks the cheapest plan. This is where index decisions are made. Output: plan tree.
4Execute — The executor walks the plan tree, pulling rows from leaf nodes (seq scan, index scan) up through joins, aggregates, and sorts. Output: result rows.

📚 Analogy: The planner is like a GPS. It doesn't know the actual traffic (data) — it estimates travel time based on historical averages (statistics). Outdated statistics = bad route choice = slow query. ANALYZE updates the statistics. VACUUM ANALYZE also reclaims dead tuple space.

Key PostgreSQL Concepts: Quick Reference

Concept	What it is	Why it matters
`MVCC`	Multi-Version Concurrency Control — each transaction sees a snapshot; old row versions kept until vacuumed	Readers never block writers; enables repeatable reads without locks
`pg_statistic`	Per-column statistics: most-common values, histogram, null fraction, distinct count	Planner cost estimates depend on this; stale stats → bad plans
`VACUUM`	Reclaims space from dead tuples (rows deleted/updated but still stored for MVCC visibility)	Without autovacuum, tables bloat; XID wraparound causes downtime
`TOAST`	The Oversized-Attribute Storage Technique — large values (text, bytea, jsonb) stored out-of-line	Transparent; but `SELECT *` on TOAST columns adds I/O you may not expect
`WAL`	Write-Ahead Log — changes written to WAL before data pages; enables crash recovery and replication	Understanding WAL is essential for replication, point-in-time recovery

📇

What an Index Actually Is

DATA STRUCTURE

An index is a separate data structure that stores a subset of column values alongside physical pointers (tuple IDs / ctids) to the heap rows. The database maintains it automatically on insert/update/delete. Every index you add speeds up reads but slows writes and consumes disk + RAM.

PostgreSQL index types: B-tree (default), Hash, GiST, SP-GiST, GIN, BRIN. For backend engineering, B-tree covers 95% of use cases.

B-tree Internals

  Root page
  ┌─────────────────────────────────────────────────────┐
  │  [50]  [100]  [200]                                  │
  │   ↓      ↓      ↓      ↓                            │
  └─────────────────────────────────────────────────────┘
    │      │      │      │
  Inner  Inner  Inner  Inner   ← branch pages (pointers only)
  pages  pages  pages  pages
    │
  ┌─────────────────────────────────────────────────────┐
  │  Leaf page (doubly linked list ←→)                  │
  │  [42, ctid(0,5)] [43, ctid(2,1)] [44, ctid(1,8)]     │
  │  → heap page 0, tuple 5                              │
  └─────────────────────────────────────────────────────┘

1Root → inner pages → leaf pages: B-tree is balanced — every leaf is at the same depth. Lookup is O(log N). For 1 million rows with 100 entries per page, depth ≈ 3. That's 3 page reads for any lookup.
2Leaf pages are doubly linked — range scans (WHERE id BETWEEN 100 AND 200) walk the linked list after finding the start, without re-traversing from root.
3Each leaf entry stores: key value + ctid (physical location: page number, tuple offset in page). The executor fetches the heap page using the ctid.
4Index-only scan: If all needed columns are in the index (covering index), the heap page fetch is skipped entirely. Check: EXPLAIN shows "Index Only Scan".

Index Types Reference

Type	Operators supported	Best for	Notes
B-tree	`=, <, <=, >, >=, BETWEEN, LIKE 'foo%'`	Most use cases; equality and range queries	Default. Supports `ORDER BY` without sort node.
Hash	`=` only	Pure equality on large text keys	Faster than B-tree for equality only. No range. Not WAL-logged before PG10 (avoid pre-10).
GIN	`@>, <@, &&, @@`	JSONB containment, full-text search, array overlap	Inverted index. Fast reads, slow updates. Use `gin_pending_list_limit`.
GiST	Geometric, range types, full-text	PostGIS (geography), range overlaps, nearest-neighbour	Extensible. Slower build than B-tree.
BRIN	Range queries on physically ordered data	Time-series, append-only tables (created_at)	Very small index (stores min/max per block range). Useless if data not correlated with physical order.

Composite Indexes: Column Order Matters

📐

The Left-Prefix Rule

CRITICAL

A composite index (a, b, c) can satisfy queries that filter on: a, (a, b), or (a, b, c). It cannot be used for queries that only filter on b or c alone, because the B-tree is sorted by a first.

-- Index: (user_id, status, created_at)

-- ✅ Uses index (full prefix)
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending'
ORDER BY created_at DESC;

-- ✅ Uses index (partial prefix, range on last)
SELECT * FROM orders
WHERE user_id = 42 AND created_at > '2026-01-01';

-- ❌ Cannot use this index (no leading column)
SELECT * FROM orders
WHERE status = 'pending';  -- needs separate index on (status)

Rule of thumb: Put the most selective column first (highest cardinality — most distinct values). Equality columns before range columns. The range column should be last — once you hit a range condition, the remaining columns in the index cannot be used for filtering.

Partial Indexes

✂️

Index Only the Rows You Query

PERFORMANCE

A partial index includes only rows matching a WHERE clause. Smaller, faster to maintain, and can enforce constraints on subsets of data.

-- Index only unprocessed jobs (99% of queries target this subset)
CREATE INDEX idx_jobs_pending
  ON jobs(created_at)
  WHERE status = 'pending';

-- Partial unique constraint: only one active record per user
CREATE UNIQUE INDEX idx_subscriptions_active_user
  ON subscriptions(user_id)
  WHERE cancelled_at IS NULL;

-- Query MUST include the partial index predicate to use it
SELECT * FROM jobs
WHERE status = 'pending' AND created_at < now() - interval '5 minutes';

Expression Indexes

🔣

Indexing the Result of a Function

FUNCTIONAL INDEXES

When queries apply a function to a column in WHERE, a plain column index is useless — the function result isn't in the index. Expression indexes solve this.

-- ❌ Index on email NOT used — function applied to column
SELECT * FROM users WHERE lower(email) = 'alice@example.com';

-- ✅ Create expression index to match the query
CREATE INDEX idx_users_email_lower
  ON users(lower(email));

-- Now the query above uses the index.
-- The index stores lower(email) values, not raw email values.

-- Other common examples:
CREATE INDEX ON events(date_trunc('day', created_at));
CREATE INDEX ON products((metadata->>
'sku'));  -- JSONB field extraction

The Real Cost of Indexes

Operation	Impact
`INSERT`	Every index on the table gets an entry inserted. 5 indexes = 5 B-tree insertions + potential page splits.
`UPDATE`	If the indexed column changes: old entry deleted + new entry inserted in every affected index. Heap-only tuple (HOT) update avoids this if the page has space and no indexed column changes.
`DELETE`	Index entries marked dead; not removed until VACUUM runs.
Disk space	A B-tree index on a 100M-row integer column ≈ 2–3 GB. JSONB GIN indexes can be 2–5× the table size.
Cache pollution	Indexes compete with table data for `shared_buffers`. Unused indexes evict useful data.

⚠️ Find and drop unused indexes: SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND schemaname = 'public'; — Zero scans since last stats reset means the index is dead weight. Drop it.

🔍

Reading Query Plans

EXPLAIN ANALYZE

EXPLAIN shows the plan the planner chose with estimated costs. EXPLAIN ANALYZE executes the query and shows actual runtime. Always use EXPLAIN (ANALYZE, BUFFERS) in production diagnosis — BUFFERS reveals cache hits vs disk reads.

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, count(o.id) AS total_orders
FROM users u
JOIN  orders o ON o.user_id = u.id
WHERE u.status = 'active'
GROUP BY u.name
ORDER BY total_orders DESC
LIMIT 10;

Anatomy of an EXPLAIN Output

Limit  (cost=1240.50..1240.53 rows=10 width=40)
         (actual time=18.432..18.435 rows=10 loops=1)
  ->  Sort  (cost=1240.50..1242.00 rows=600 width=40)
            (actual time=18.428..18.430 rows=10 loops=1)
        Sort Key: (count(o.id)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
      ->  HashAggregate  (cost=1200.00..1215.00 rows=600 width=40)
                         (actual time=17.900..18.100 rows=621 loops=1)
            Group Key: u.name
          ->  Hash Join  (cost=85.00..1125.00 rows=15000 width=16)
                         (actual time=1.200..14.800 rows=14832 loops=1)
                Hash Cond: (o.user_id = u.id)
              ->  Seq Scan on orders o  (cost=0.00..890.00 rows=50000 width=8)
                                       (actual time=0.050..6.400 rows=50000 loops=1)
              ->  Hash  (cost=75.00..75.00 rows=800 width=8)
                         (actual time=0.900..0.900 rows=823 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 42kB
                  ->  Index Scan on users u  (cost=0.29..75.00 rows=800 width=8)
                                            (actual time=0.060..0.800 rows=823 loops=1)
                         Index Cond: (status = 'active')

Planning Time: 0.842 ms
Execution Time: 18.521 ms

How to Read It

Field	Meaning	What to watch for
`cost=X..Y`	X = startup cost (first row), Y = total cost (arbitrary planner units). Lower is better. Used for plan comparison only — not ms.	Very high total cost vs actual time can indicate stale statistics.
`rows=N`	Estimated (in cost line) vs actual (in actual time line) row count.	Large estimate vs actual mismatch → run `ANALYZE`. Bad estimates cascade into bad join/sort choices.
`loops=N`	How many times this node executed. Nested loops execute inner node once per outer row.	High loops × high rows = N+1 problem at the SQL level.
`Seq Scan`	Full table scan — reads every page.	Expected on small tables or when fetching most rows. Bad on large tables with selective WHERE clause.
`Index Scan`	B-tree traversal + heap fetches for each found row.	Good. If many rows fetched: consider `Bitmap Index Scan` which batches heap fetches.
`Bitmap Heap Scan`	Builds bitmap of matching heap pages, then fetches them in order (reduces random I/O).	Usually good for medium selectivity (5–20% of rows).
`Hash Join`	Build hash table from smaller relation, probe with larger.	Good for large joins. Needs `work_mem`; if batches > 1 it spills to disk.
`Nested Loop`	For each outer row, scan inner. O(N×M) worst case.	Good when outer is small + inner has an index. Bad on large tables without index.

Warning Signs in Plans

🚨

Red Flags

Seq Scan on large table with low actual rows → missing index
Estimated rows ≫ actual rows → stale stats, run ANALYZE
Nested Loop with loops > 1000 → N+1 pattern in SQL
Sort with external merge → work_mem too low for sort
Hash Batches > 1 → hash join spilling to disk
Filter rows ≫ 0 after Index Scan → over-fetching, index not selective enough

✅

Good Signs

Index Only Scan → all columns in index, heap not touched
Bitmap Index Scan + Bitmap Heap Scan → efficient bulk access
Hash Join with Batches=1 → fits in memory
Estimated ≈ actual rows → planner has accurate stats
Planning time < 1ms → query is not overly complex
Execution time consistent → data fits in shared_buffers (Buffers: hit)

Use explain.dalibo.com to paste EXPLAIN (FORMAT JSON) output and get a visual, colour-coded plan tree. Far easier to navigate than text format for complex queries.

The N+1 Query Problem

🐛

N+1: The Silent Performance Killer

ANTI-PATTERN

N+1 occurs when you fetch a list of N items then issue a separate query for each item. The result is N+1 round trips to the database instead of 1 or 2.

-- ❌ N+1: fetch 100 users, then 1 query per user for their posts
users = SELECT * FROM users LIMIT 100;        -- 1 query
FOR u IN users:
    posts = SELECT * FROM posts WHERE user_id = u.id;  -- 100 queries
-- Total: 101 queries. On 50ms RTT: ~5 seconds.

-- ✅ Solution 1: JOIN
SELECT u.*, p.title, p.created_at
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
LIMIT 100;  -- 1 query

-- ✅ Solution 2: Batch load (IN clause)
user_ids = [1, 2, 3, ..., 100];
SELECT * FROM posts
WHERE user_id = ANY($1);  -- 2 total queries

JOIN Types

JOIN Type	Returns	Use when
`INNER JOIN`	Rows with matching keys in both tables	Both sides must exist. Most common.
`LEFT JOIN`	All rows from left + matching from right (NULLs where no match)	Optional relationship: user may or may not have a profile.
`RIGHT JOIN`	All rows from right + matching from left	Rare; rewrite as LEFT JOIN with tables swapped.
`FULL OUTER JOIN`	All rows from both, NULLs where no match	Reconciliation queries, finding unmatched rows on either side.
`CROSS JOIN`	Cartesian product (M × N rows)	Generating test data, pairing every item with every other. Dangerous on large tables.
`LATERAL JOIN`	Subquery that can reference outer query's columns	"For each user, get their 3 most recent orders" — correlated subquery without N+1.

CTEs vs Subqueries

📝

Common Table Expressions

WITH CLAUSE

-- Subquery (inline, can be optimised away by planner)
SELECT u.name, o.total
FROM users u
JOIN (SELECT user_id, sum(amount) AS total
      FROM orders GROUP BY user_id) o
ON o.user_id = u.id;

-- CTE (WITH) — same query, more readable
WITH order_totals AS (
  SELECT user_id, sum(amount) AS total
  FROM orders
  GROUP BY user_id
)
SELECT u.name, ot.total
FROM users u
JOIN order_totals ot ON ot.user_id = u.id;

-- Recursive CTE: walk a tree/graph (org hierarchy, comments thread)
WITH RECURSIVE org_tree AS (
  SELECT id, name, manager_id, 0 AS depth
  FROM employees WHERE manager_id IS NULL  -- anchor
  UNION ALL
  SELECT e.id, e.name, e.manager_id, ot.depth + 1
  FROM employees e
  JOIN org_tree ot ON e.manager_id = ot.id    -- recursive
)
SELECT * FROM org_tree ORDER BY depth;

PostgreSQL ≥ 12: CTEs are inlined by default (treated as subqueries, planner can optimise through them). Use WITH ... AS MATERIALIZED to force the old behaviour — the CTE is executed once and cached, which can be faster when referenced multiple times.

</p>

Window Functions

🪟

Aggregates Without Collapsing Rows

OVER CLAUSE

-- Rank users by order count within each country
SELECT
  name,
  country,
  order_count,
  rank() OVER (
    PARTITION BY country
    ORDER BY order_count DESC
  ) AS country_rank
FROM users;

-- Running total (cumulative sum)
SELECT
  created_at::date,
  revenue,
  sum(revenue) OVER (ORDER BY created_at::date) AS running_total
FROM daily_sales;

-- Row number for cursor pagination (stable ordering)
SELECT *, row_number() OVER (ORDER BY id) AS rn
FROM events
WHERE id > :last_cursor
LIMIT 50;

-- Lag/Lead: compare row with previous/next
SELECT
  date,
  price,
  lag(price, 1) OVER (ORDER BY date) AS prev_price,
  price - lag(price, 1) OVER (ORDER BY date) AS change
FROM stock_prices;

UPSERT and Conflict Handling

-- Insert or update on conflict (upsert)
INSERT INTO user_stats (user_id, login_count, last_seen)
VALUES ($1, 1, now())
ON CONFLICT (user_id) DO UPDATE
  SET login_count = user_stats.login_count + 1,
      last_seen   = EXCLUDED.last_seen;  -- EXCLUDED = the rejected row

-- Insert and ignore duplicates
INSERT INTO events (id, payload)
VALUES ($1, $2)
ON CONFLICT (id) DO NOTHING;

-- Returning the affected rows (useful for getting auto-generated IDs)
INSERT INTO users (name, email)
VALUES ($1, $2)
RETURNING id, created_at;

📐

Normalization: Eliminating Redundancy

DATA MODELLING

Normalization is the process of structuring tables to minimise data redundancy and prevent update anomalies. Each normal form adds a constraint. In practice, design to 3NF/BCNF and deliberately denormalize specific hot paths with explicit justification.

Normal Forms

1NF

First Normal Form — Atomic Values

Rule: Each column holds a single, indivisible value. No repeating groups (multiple phone numbers in one column). Each row is uniquely identifiable (primary key exists).

Violation: users(id, name, phones="555-1234,555-5678") — comma-delimited phones in one column.

Fix: Separate user_phones(user_id, phone_number) table with one row per phone.

2NF

Second Normal Form — No Partial Dependencies

Rule: Must be 1NF. Every non-key column must depend on the whole primary key, not just part of it. Only relevant for tables with composite primary keys.

Violation: order_items(order_id, product_id, quantity, product_name) — product_name depends only on product_id, not the full (order_id, product_id) key.

Fix: Move product_name to the products table. order_items keeps only quantity.

3NF

Third Normal Form — No Transitive Dependencies

Rule: Must be 2NF. No non-key column depends on another non-key column (transitive dependency).

Violation: employees(id, dept_id, dept_name, salary) — dept_name depends on dept_id, not directly on id.

Fix: Extract departments(dept_id, dept_name). employees keeps only dept_id as a foreign key.

BCNF

Boyce-Codd Normal Form — Every Determinant Is a Candidate Key

Rule: Stricter than 3NF. For every functional dependency X → Y, X must be a superkey (uniquely identifies rows). Fixes edge cases in 3NF with overlapping composite candidate keys.

Violation: course_teachers(student, course, teacher) where a teacher teaches only one course but a course can have multiple teachers. The dependency teacher → course exists, but teacher is not a key.

Fix: Decompose into teacher_courses(teacher, course) and student_courses(student, course).

When to Denormalize

⚡

Deliberate Denormalization

TRADE-OFF

Normalization optimises for write integrity. Denormalization trades integrity for read performance. Denormalize only when you have a measured performance problem and accept the consistency maintenance burden.

Pattern	What it does	Maintenance cost
Cached count	Store `posts.comment_count` alongside the post; increment/decrement via trigger or application logic	Must update on every insert/delete to comments
Materialized view	Pre-computed JOIN result stored as a table (`CREATE MATERIALIZED VIEW`); refreshed on schedule or trigger	`REFRESH MATERIALIZED VIEW CONCURRENTLY` — doesn't block reads
Duplicated columns	Copy `user.name` into `orders.user_name` to avoid JOIN on every order list	Must sync on user name change (rare)
JSONB for variable attributes	Store flexible/sparse attributes in a JSONB column instead of an EAV table	No schema enforcement; use GIN index for querying

🔒

ACID: Four Guarantees

CORRECTNESS

ACID is the contract a database makes about what happens to your data, even in the face of crashes, errors, and concurrent access. Understanding each property tells you what you can and cannot rely on.

AAtomicity — A transaction is all-or-nothing. If any statement within a transaction fails, all changes are rolled back. There is no partial transaction. Implemented via the WAL: on crash, incomplete transactions are rolled back during recovery using the WAL log.
CConsistency — The database moves from one valid state to another. Constraints (FK, NOT NULL, CHECK, UNIQUE) are enforced at commit time. The application is responsible for domain-level consistency (business rules).
IIsolation — Concurrent transactions behave as if they ran serially. The degree of isolation is configurable (see isolation levels below). Full isolation is expensive; databases offer weaker but faster levels.
DDurability — Committed transactions survive crashes. Achieved by flushing WAL to disk before acknowledging the commit. synchronous_commit = off trades durability for latency (risk: last few milliseconds of commits lost on crash).

Read Phenomena (What Can Go Wrong)

Phenomenon	Description	Example
Dirty Read	Read uncommitted data from another transaction that may roll back	T1 reads a balance modified by T2 before T2 commits. T2 rolls back — T1 read phantom data.
Non-repeatable Read	Re-reading the same row within a transaction returns different data	T1 reads price=10. T2 updates price=20 and commits. T1 re-reads: price=20. Same query, different result.
Phantom Read	Re-executing a range query returns different rows	T1 counts orders WHERE status='pending': 5 rows. T2 inserts a new pending order. T1 re-counts: 6 rows.
Serialisation Anomaly	Result is inconsistent with any serial order of the transactions	Write skew: two doctors both read "at least 1 on call", both go off-call — result: zero doctors on call.

Isolation Levels

Level	Dirty Read	Non-Repeatable	Phantom	Serialisation Anomaly	PostgreSQL default?
Read Uncommitted	Possible	Possible	Possible	Possible	No (PG treats as Read Committed)
Read Committed	Not possible	Possible	Possible	Possible	✅ Default
Repeatable Read	Not possible	Not possible	Not possible*	Possible	No
Serializable	Not possible	Not possible	Not possible	Not possible	No (highest cost)

*PostgreSQL's Repeatable Read uses snapshot isolation which prevents phantoms too — stronger than the SQL standard requires. PG's Serializable uses Serializable Snapshot Isolation (SSI), a lock-free algorithm that detects and aborts transactions that would create anomalies.

</p>

Deadlocks

💀

Deadlock Detection and Prevention

CONCURRENCY

A deadlock occurs when two (or more) transactions each hold a lock the other needs, creating a circular wait.

-- T1                              T2
BEGIN;                           BEGIN;
UPDATE accounts SET bal=bal-100   UPDATE accounts SET bal=bal+100
WHERE id=1;  -- locks row 1       WHERE id=2;  -- locks row 2
UPDATE accounts SET bal=bal+100   UPDATE accounts SET bal=bal-100
WHERE id=2;  -- waits for row 2   WHERE id=1;  -- waits for row 1 ← DEADLOCK

PostgreSQL detects deadlocks automatically (within deadlock_timeout, default 1s) and aborts one transaction with: ERROR: deadlock detected. The other continues normally.

Prevention Strategies

Consistent lock ordering: always lock resources in the same order (e.g., always lock lower ID first). Eliminates the circular wait condition.
SELECT FOR UPDATE SKIP LOCKED: skip rows locked by other transactions instead of waiting. Useful for job queues.
NOWAIT: fail immediately if lock cannot be acquired: SELECT ... FOR UPDATE NOWAIT → returns error, never waits.
Short transactions: the shorter a transaction, the shorter the lock hold time, the lower the probability of deadlock.

Advisory Locks

🔑

Application-Level Mutex via PostgreSQL

DISTRIBUTED LOCKING

Advisory locks are arbitrary integer locks managed by the application — PostgreSQL doesn't tie them to any table row. Useful for distributed mutual exclusion (e.g., ensuring only one instance runs a cron job).

-- Acquire exclusive advisory lock (blocks if held by another session)
SELECT pg_advisory_lock(12345);

-- Non-blocking try: returns true/false
SELECT pg_try_advisory_lock(12345);

-- Transaction-scoped (auto-released on commit/rollback)
SELECT pg_advisory_xact_lock(12345);

-- Release session-scoped lock explicitly
SELECT pg_advisory_unlock(12345);

Connecting to PostgreSQL with libpq

#include <libpq-fe.h>
#include <stdio.h>
#include <stdlib.h>

/* Compile: gcc db.c -lpq -o db
   Requires: libpq-dev package (apt install libpq-dev)  */

PGconn *db_connect(const char *connstr) {
    PGconn *conn = PQconnectdb(connstr);
    if (PQstatus(conn) != CONNECTION_OK) {
        fprintf(stderr, "Connection failed: %s\n", PQerrorMessage(conn));
        PQfinish(conn);
        return NULL;
    }
    return conn;
}

int main(void) {
    /* Connection string: can also use env vars PGHOST, PGPORT, PGUSER etc */
    PGconn *conn = db_connect(
        "host=localhost port=5432 dbname=myapp user=myuser password=secret"
    );
    if (!conn) return 1;

    printf("Connected to PostgreSQL %s\n", PQparameterStatus(conn, "server_version"));
    PQfinish(conn);
    return 0;
}

Prepared Statements (Parameterised Queries)

/* Always use prepared statements — never string-concatenate user input */

int get_user_by_id(PGconn *conn, int user_id) {
    /* Prepare once, execute many times */
    PGresult *prep = PQprepare(conn,
        "get_user",                         /* statement name */
        "SELECT id, name, email FROM users WHERE id = $1",
        1,                                   /* number of params */
        NULL                                 /* param types (NULL = infer) */
    );
    if (PQresultStatus(prep) != PGRES_COMMAND_OK) {
        fprintf(stderr, "Prepare failed: %s\n", PQresultErrorMessage(prep));
        PQclear(prep);
        return -1;
    }
    PQclear(prep);

    /* Execute: pass parameter as string (libpq converts) */
    char id_str[16];
    snprintf(id_str, sizeof(id_str), "%d", user_id);
    const char *params[] = { id_str };

    PGresult *res = PQexecPrepared(conn,
        "get_user",   /* statement name */
        1,             /* nParams */
        params,        /* paramValues */
        NULL,          /* paramLengths (NULL = text mode, use strlen) */
        NULL,          /* paramFormats (NULL = all text) */
        0              /* resultFormat: 0=text, 1=binary */
    );

    if (PQresultStatus(res) != PGRES_TUPLES_OK) {
        fprintf(stderr, "Query failed: %s\n", PQresultErrorMessage(res));
        PQclear(res);
        return -1;
    }

    int rows = PQntuples(res);
    int cols = PQnfields(res);
    printf("Got %d rows, %d cols\n", rows, cols);

    for (int r = 0; r < rows; r++) {
        for (int c = 0; c < cols; c++) {
            printf("  %s = %s\n",
                   PQfname(res, c),
                   PQgetisnull(res, r, c) ? "(null)" : PQgetvalue(res, r, c));
        }
    }

    PQclear(res);
    return rows;
}

Transactions in C

int transfer_funds(PGconn *conn, int from_id, int to_id, int amount_cents) {
    /* BEGIN */
    PGresult *r = PQexec(conn, "BEGIN");
    if (PQresultStatus(r) != PGRES_COMMAND_OK) { PQclear(r); return -1; }
    PQclear(r);

    /* Debit — lock the row first with FOR UPDATE */
    char from_str[16], to_str[16], amt_str[16];
    snprintf(from_str, sizeof(from_str), "%d", from_id);
    snprintf(to_str,   sizeof(to_str),   "%d", to_id);
    snprintf(amt_str,  sizeof(amt_str),  "%d", amount_cents);

    const char *debit_params[] = { amt_str, from_str };
    r = PQexecParams(conn,
        "UPDATE accounts SET balance = balance - $1 WHERE id = $2 AND balance >= $1",
        2, NULL, debit_params, NULL, NULL, 0);

    if (PQresultStatus(r) != PGRES_COMMAND_OK ||
        strcmp(PQcmdTuples(r), "1") != 0) {  /* 0 rows = insufficient balance */
        PQclear(r);
        PQexec(conn, "ROLLBACK");
        return -1;
    }
    PQclear(r);

    /* Credit */
    const char *credit_params[] = { amt_str, to_str };
    r = PQexecParams(conn,
        "UPDATE accounts SET balance = balance + $1 WHERE id = $2",
        2, NULL, credit_params, NULL, NULL, 0);

    if (PQresultStatus(r) != PGRES_COMMAND_OK) {
        PQclear(r);
        PQexec(conn, "ROLLBACK");
        return -1;
    }
    PQclear(r);

    /* COMMIT */
    r = PQexec(conn, "COMMIT");
    int ok = PQresultStatus(r) == PGRES_COMMAND_OK;
    PQclear(r);
    return ok ? 0 : -1;
}

Minimal Connection Pool in C

/* Production uses PgBouncer. This illustrates the concept. */

#define POOL_SIZE 8

typedef struct {
    PGconn  *conn;
    int      in_use;
    pthread_mutex_t lock;
} conn_slot_t;

static conn_slot_t pool[POOL_SIZE];
static pthread_mutex_t pool_lock = PTHREAD_MUTEX_INITIALIZER;

void pool_init(const char *connstr) {
    for (int i = 0; i < POOL_SIZE; i++) {
        pool[i].conn   = PQconnectdb(connstr);
        pool[i].in_use = 0;
        pthread_mutex_init(&pool[i].lock, NULL);
    }
}

PGconn *pool_acquire(void) {
    pthread_mutex_lock(&pool_lock);
    for (int i = 0; i < POOL_SIZE; i++) {
        if (!pool[i].in_use && PQstatus(pool[i].conn) == CONNECTION_OK) {
            pool[i].in_use = 1;
            pthread_mutex_unlock(&pool_lock);
            return pool[i].conn;
        }
    }
    pthread_mutex_unlock(&pool_lock);
    return NULL;  /* pool exhausted */
}

void pool_release(PGconn *conn) {
    pthread_mutex_lock(&pool_lock);
    for (int i = 0; i < POOL_SIZE; i++) {
        if (pool[i].conn == conn) {
            pool[i].in_use = 0;
            break;
        }
    }
    pthread_mutex_unlock(&pool_lock);
}

In production: use PgBouncer (transaction-mode pooling) in front of PostgreSQL. It handles thousands of client connections multiplexed over a small number of server connections. Never create a new PGconn per request in a high-throughput server.

🔬 Lab 1 — Index Experiments on Real Data

TOOLS: PostgreSQL · psql · pgbench

Goal: Generate a large dataset and observe the before/after performance impact of indexes using EXPLAIN ANALYZE.

1Create a test database and generate 1M rows:
createdb perflab

psql perflab -c "CREATE TABLE orders (id serial PRIMARY KEY, user_id int, status text, amount int, created_at timestamptz DEFAULT now());"

psql perflab -c "INSERT INTO orders (user_id, status, amount) SELECT (random()*10000)::int, (ARRAY['pending','complete','cancelled'])[ceil(random()*3)::int], (random()*10000)::int FROM generate_series(1,1000000);"

2Run a query without indexes and capture the plan:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';
Note: execution time, "Seq Scan", rows removed by filter.

3Add an index and compare:
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
Re-run the EXPLAIN. Note: execution time, node type change (Index Scan or Bitmap Index Scan), rows fetched.

4Try the reverse column order: CREATE INDEX idx_orders_status_user ON orders(status, user_id);. Run WHERE status = 'pending' AND user_id = 42. Which index does the planner choose? Why?

5Test a partial index: CREATE INDEX idx_orders_pending ON orders(user_id) WHERE status = 'pending';. Compare plan for pending-only query vs a query across all statuses.

6Find unused indexes: SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE schemaname='public' ORDER BY idx_scan;. Drop any with 0 scans.

🔬 Lab 2 — Isolation Levels in Action

TOOLS: PostgreSQL · two psql sessions

Goal: Observe the difference between Read Committed and Repeatable Read isolation levels by running concurrent transactions in two terminal sessions.

1Set up: CREATE TABLE balances (id int PRIMARY KEY, amount int); INSERT INTO balances VALUES (1, 1000), (2, 500);

2Non-repeatable read demo (Read Committed):
Session A: BEGIN; SELECT amount FROM balances WHERE id=1; → 1000
Session B: UPDATE balances SET amount=2000 WHERE id=1; COMMIT;
Session A: SELECT amount FROM balances WHERE id=1; → 2000 (changed!)
Session A: ROLLBACK;

3Repeatable Read prevents this:
Session A: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT amount FROM balances WHERE id=1; → 1000
Session B: UPDATE balances SET amount=3000 WHERE id=1; COMMIT;
Session A: SELECT amount FROM balances WHERE id=1; → still 1000 (snapshot!)
Session A: COMMIT;

4Deadlock simulation:
Session A: BEGIN; UPDATE balances SET amount=amount-100 WHERE id=1;
Session B: BEGIN; UPDATE balances SET amount=amount-100 WHERE id=2;
Session A: UPDATE balances SET amount=amount+100 WHERE id=2; → waits
Session B: UPDATE balances SET amount=amount+100 WHERE id=1; → deadlock! One session gets: ERROR: deadlock detected

5Fix the deadlock: always update rows in a consistent order (lower id first). Rewrite both transactions to update id=1 before id=2. Verify no deadlock occurs.

🔬 Lab 3 — libpq CRUD Application in C

TOOLS: gcc · libpq-dev · PostgreSQL

Goal: Build a command-line user management tool using libpq with prepared statements and transactions.

1Create the schema:

CREATE TABLE users (id serial PRIMARY KEY, name text NOT NULL, email text UNIQUE NOT NULL, created_at timestamptz DEFAULT now());

2Write users.c using the libpq code from the Implementation tab. Implement: user_create(conn, name, email) → returns new id, user_find(conn, id) → prints user, user_list(conn) → prints all users.

3Use PQprepare for all queries. Verify that SQL injection is not possible: try passing name = "'; DROP TABLE users; --" as the name parameter. The prepared statement should store it literally.

4Implement user_transfer_email(conn, from_id, to_id) which copies the email from one user to another in a single transaction, verifying both users exist. Use BEGIN / COMMIT / ROLLBACK pattern.

5Add error handling: check PQresultStatus on every result. Handle PGRES_FATAL_ERROR (constraint violations like duplicate email) gracefully — print the error message, don't crash.

6Stretch: Implement the minimal connection pool from the Implementation tab. Run the tool from multiple threads simultaneously using pthread_create. Verify correctness under concurrent inserts.

🔬 Lab 4 — Normalization Design Exercise

TOOLS: pen & paper · psql

Goal: Take a denormalized flat table and normalize it to 3NF, then measure query performance before and after.

1Start with this flat table:

CREATE TABLE flat_orders (order_id int, order_date date, customer_id int, customer_name text, customer_email text, customer_city text, product_id int, product_name text, product_category text, quantity int, unit_price numeric);

2Identify all functional dependencies. Which columns depend on order_id? On customer_id? On product_id? Map the violations of 2NF and 3NF.

3Design the normalized schema: customers, products, orders, order_items. Write the CREATE TABLE statements with appropriate primary keys and foreign keys.

4Generate 100k rows in the flat table and in the normalized schema. Write a JOIN query that produces the same output as SELECT * FROM flat_orders. Compare execution plans and times.

5Deliberate denormalization: Add a total_amount column to orders (computed as SUM of quantity × unit_price). Update it via a trigger. Measure query time for "get orders with total > 1000" with vs without the denormalized column.

✅

Module Mastery Checklist

M06 COMPLETE

You have mastered this module when you can check off every item below without referring to notes.

SQL Internals

Describe the 4 stages of SQL query execution in PostgreSQL (parse → rewrite → plan → execute)
Explain what pg_statistic contains and why stale statistics cause bad query plans
Explain MVCC: why readers don't block writers, and what VACUUM does

Indexes

Describe the B-tree internal structure: root, inner pages, leaf pages, leaf page linked list, ctid pointers
Explain why a B-tree lookup is O(log N) and state the typical depth for 1M rows
Apply the left-prefix rule: given an index (a, b, c), state which WHERE clauses can use it
Explain why an equality column should come before a range column in a composite index
Create a partial index and a functional (expression) index; state when each is appropriate
State the write overhead of indexes and how to find unused indexes using pg_stat_user_indexes
Distinguish B-tree, Hash, GIN, and BRIN — name the use case each is best suited for

EXPLAIN

Interpret an EXPLAIN output: identify seq scan vs index scan vs bitmap scan, explain cost fields, identify row count mismatch
State 4 red flags in a query plan (seq scan on large table, estimate vs actual mismatch, nested loop with high loops, sort spill)
Explain what EXPLAIN (ANALYZE, BUFFERS) adds over plain EXPLAIN

Query Patterns

Define the N+1 problem and write two solutions (JOIN and batch IN clause)
State the difference between INNER, LEFT, FULL OUTER, and LATERAL JOIN
Write a recursive CTE to traverse a parent-child hierarchy
Use a window function to rank rows within a partition and compute a running total
Write an UPSERT using ON CONFLICT DO UPDATE

Normalization

State the rules for 1NF, 2NF, 3NF, and BCNF; identify which normal form a given table violates
Normalize a denormalized flat table to 3NF, producing correct FK relationships
Name 3 deliberate denormalization patterns and the maintenance cost of each

ACID & Isolation

Define Atomicity, Consistency, Isolation, Durability — and state which mechanism implements each in PostgreSQL
Define dirty read, non-repeatable read, phantom read, and serialisation anomaly
State which read phenomena each isolation level prevents; name PostgreSQL's default isolation level
Describe a deadlock scenario, how PostgreSQL detects it, and two prevention strategies
Write a fund transfer in C using libpq with correct BEGIN/COMMIT/ROLLBACK error handling
Explain advisory locks and give one production use case

C / libpq

Write a libpq connection, prepared statement, and parameterised execution in C
Handle PQresultStatus correctly for all result types (TUPLES_OK, COMMAND_OK, FATAL_ERROR)
Explain why PgBouncer is needed in production rather than creating a new PGconn per request

Next in Phase 2: M07 covers Transactions & MVCC in depth (savepoints, advisory locks, SKIP LOCKED job queues) and M08 covers NoSQL & Redis (when to reach for a non-relational store and how to use it alongside PostgreSQL).

← M03 REST Design Phase 2 · Module 6 of 3 ↑ Roadmap