Backend Engineering · Phase 2 · Module 6
SQL & Indexing
The difference between a query that takes 2 ms and one that takes 2 minutes is almost always an index — or the lack of one.
B-tree Internals EXPLAIN ANALYZE Index Types Normalization ACID Isolation Levels libpq / C
🗄️

SQL is the Most Underestimated Backend Skill

POSTGRESQL FOCUS

Most backend developers know SQL syntax. Very few understand what happens inside the database when a query runs. That gap is where 90% of production performance problems hide.

This module covers how PostgreSQL executes queries, how indexes work at the data-structure level, how to read execution plans, and what the ACID properties and isolation levels actually guarantee — with enough depth to diagnose real production incidents.

SQL Query Execution Pipeline

📚 Analogy: The planner is like a GPS. It doesn't know the actual traffic (data) — it estimates travel time based on historical averages (statistics). Outdated statistics = bad route choice = slow query. ANALYZE updates the statistics. VACUUM ANALYZE also reclaims dead tuple space.

Key PostgreSQL Concepts: Quick Reference

ConceptWhat it isWhy it matters
MVCCMulti-Version Concurrency Control — each transaction sees a snapshot; old row versions kept until vacuumedReaders never block writers; enables repeatable reads without locks
pg_statisticPer-column statistics: most-common values, histogram, null fraction, distinct countPlanner cost estimates depend on this; stale stats → bad plans
VACUUMReclaims space from dead tuples (rows deleted/updated but still stored for MVCC visibility)Without autovacuum, tables bloat; XID wraparound causes downtime
TOASTThe Oversized-Attribute Storage Technique — large values (text, bytea, jsonb) stored out-of-lineTransparent; but SELECT * on TOAST columns adds I/O you may not expect
WALWrite-Ahead Log — changes written to WAL before data pages; enables crash recovery and replicationUnderstanding WAL is essential for replication, point-in-time recovery
📇

What an Index Actually Is

DATA STRUCTURE

An index is a separate data structure that stores a subset of column values alongside physical pointers (tuple IDs / ctids) to the heap rows. The database maintains it automatically on insert/update/delete. Every index you add speeds up reads but slows writes and consumes disk + RAM.

PostgreSQL index types: B-tree (default), Hash, GiST, SP-GiST, GIN, BRIN. For backend engineering, B-tree covers 95% of use cases.

B-tree Internals

  Root page
  ┌─────────────────────────────────────────────────────┐
  │  [50]  [100]  [200]                                  │
  │   ↓      ↓      ↓      ↓                            │
  └─────────────────────────────────────────────────────┘
    │      │      │      │
  Inner  Inner  Inner  Inner   ← branch pages (pointers only)
  pages  pages  pages  pages
    │
  ┌─────────────────────────────────────────────────────┐
  │  Leaf page (doubly linked list ←→)                  │
  │  [42, ctid(0,5)] [43, ctid(2,1)] [44, ctid(1,8)]     │
  │  → heap page 0, tuple 5                              │
  └─────────────────────────────────────────────────────┘

Index Types Reference

TypeOperators supportedBest forNotes
B-tree=, <, <=, >, >=, BETWEEN, LIKE 'foo%'Most use cases; equality and range queriesDefault. Supports ORDER BY without sort node.
Hash= onlyPure equality on large text keysFaster than B-tree for equality only. No range. Not WAL-logged before PG10 (avoid pre-10).
GIN@>, <@, &&, @@JSONB containment, full-text search, array overlapInverted index. Fast reads, slow updates. Use gin_pending_list_limit.
GiSTGeometric, range types, full-textPostGIS (geography), range overlaps, nearest-neighbourExtensible. Slower build than B-tree.
BRINRange queries on physically ordered dataTime-series, append-only tables (created_at)Very small index (stores min/max per block range). Useless if data not correlated with physical order.

Composite Indexes: Column Order Matters

📐

The Left-Prefix Rule

CRITICAL

A composite index (a, b, c) can satisfy queries that filter on: a, (a, b), or (a, b, c). It cannot be used for queries that only filter on b or c alone, because the B-tree is sorted by a first.

-- Index: (user_id, status, created_at)

-- ✅ Uses index (full prefix)
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending'
ORDER BY created_at DESC;

-- ✅ Uses index (partial prefix, range on last)
SELECT * FROM orders
WHERE user_id = 42 AND created_at > '2026-01-01';

-- ❌ Cannot use this index (no leading column)
SELECT * FROM orders
WHERE status = 'pending';  -- needs separate index on (status)

Rule of thumb: Put the most selective column first (highest cardinality — most distinct values). Equality columns before range columns. The range column should be last — once you hit a range condition, the remaining columns in the index cannot be used for filtering.

Partial Indexes

✂️

Index Only the Rows You Query

PERFORMANCE

A partial index includes only rows matching a WHERE clause. Smaller, faster to maintain, and can enforce constraints on subsets of data.

-- Index only unprocessed jobs (99% of queries target this subset)
CREATE INDEX idx_jobs_pending
  ON jobs(created_at)
  WHERE status = 'pending';

-- Partial unique constraint: only one active record per user
CREATE UNIQUE INDEX idx_subscriptions_active_user
  ON subscriptions(user_id)
  WHERE cancelled_at IS NULL;

-- Query MUST include the partial index predicate to use it
SELECT * FROM jobs
WHERE status = 'pending' AND created_at < now() - interval '5 minutes';

Expression Indexes

🔣

Indexing the Result of a Function

FUNCTIONAL INDEXES

When queries apply a function to a column in WHERE, a plain column index is useless — the function result isn't in the index. Expression indexes solve this.

-- ❌ Index on email NOT used — function applied to column
SELECT * FROM users WHERE lower(email) = 'alice@example.com';

-- ✅ Create expression index to match the query
CREATE INDEX idx_users_email_lower
  ON users(lower(email));

-- Now the query above uses the index.
-- The index stores lower(email) values, not raw email values.

-- Other common examples:
CREATE INDEX ON events(date_trunc('day', created_at));
CREATE INDEX ON products((metadata->>
'sku'));  -- JSONB field extraction

The Real Cost of Indexes

OperationImpact
INSERTEvery index on the table gets an entry inserted. 5 indexes = 5 B-tree insertions + potential page splits.
UPDATEIf the indexed column changes: old entry deleted + new entry inserted in every affected index. Heap-only tuple (HOT) update avoids this if the page has space and no indexed column changes.
DELETEIndex entries marked dead; not removed until VACUUM runs.
Disk spaceA B-tree index on a 100M-row integer column ≈ 2–3 GB. JSONB GIN indexes can be 2–5× the table size.
Cache pollutionIndexes compete with table data for shared_buffers. Unused indexes evict useful data.

⚠️ Find and drop unused indexes: SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND schemaname = 'public'; — Zero scans since last stats reset means the index is dead weight. Drop it.

🔍

Reading Query Plans

EXPLAIN ANALYZE

EXPLAIN shows the plan the planner chose with estimated costs. EXPLAIN ANALYZE executes the query and shows actual runtime. Always use EXPLAIN (ANALYZE, BUFFERS) in production diagnosis — BUFFERS reveals cache hits vs disk reads.

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, count(o.id) AS total_orders
FROM users u
JOIN  orders o ON o.user_id = u.id
WHERE u.status = 'active'
GROUP BY u.name
ORDER BY total_orders DESC
LIMIT 10;

Anatomy of an EXPLAIN Output

Limit  (cost=1240.50..1240.53 rows=10 width=40)
         (actual time=18.432..18.435 rows=10 loops=1)
  ->  Sort  (cost=1240.50..1242.00 rows=600 width=40)
            (actual time=18.428..18.430 rows=10 loops=1)
        Sort Key: (count(o.id)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
      ->  HashAggregate  (cost=1200.00..1215.00 rows=600 width=40)
                         (actual time=17.900..18.100 rows=621 loops=1)
            Group Key: u.name
          ->  Hash Join  (cost=85.00..1125.00 rows=15000 width=16)
                         (actual time=1.200..14.800 rows=14832 loops=1)
                Hash Cond: (o.user_id = u.id)
              ->  Seq Scan on orders o  (cost=0.00..890.00 rows=50000 width=8)
                                       (actual time=0.050..6.400 rows=50000 loops=1)
              ->  Hash  (cost=75.00..75.00 rows=800 width=8)
                         (actual time=0.900..0.900 rows=823 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 42kB
                  ->  Index Scan on users u  (cost=0.29..75.00 rows=800 width=8)
                                            (actual time=0.060..0.800 rows=823 loops=1)
                         Index Cond: (status = 'active')

Planning Time: 0.842 ms
Execution Time: 18.521 ms

How to Read It

FieldMeaningWhat to watch for
cost=X..YX = startup cost (first row), Y = total cost (arbitrary planner units). Lower is better. Used for plan comparison only — not ms.Very high total cost vs actual time can indicate stale statistics.
rows=NEstimated (in cost line) vs actual (in actual time line) row count.Large estimate vs actual mismatch → run ANALYZE. Bad estimates cascade into bad join/sort choices.
loops=NHow many times this node executed. Nested loops execute inner node once per outer row.High loops × high rows = N+1 problem at the SQL level.
Seq ScanFull table scan — reads every page.Expected on small tables or when fetching most rows. Bad on large tables with selective WHERE clause.
Index ScanB-tree traversal + heap fetches for each found row.Good. If many rows fetched: consider Bitmap Index Scan which batches heap fetches.
Bitmap Heap ScanBuilds bitmap of matching heap pages, then fetches them in order (reduces random I/O).Usually good for medium selectivity (5–20% of rows).
Hash JoinBuild hash table from smaller relation, probe with larger.Good for large joins. Needs work_mem; if batches > 1 it spills to disk.
Nested LoopFor each outer row, scan inner. O(N×M) worst case.Good when outer is small + inner has an index. Bad on large tables without index.

Warning Signs in Plans

🚨

Red Flags

  • Seq Scan on large table with low actual rows → missing index
  • Estimated rows ≫ actual rows → stale stats, run ANALYZE
  • Nested Loop with loops > 1000 → N+1 pattern in SQL
  • Sort with external mergework_mem too low for sort
  • Hash Batches > 1 → hash join spilling to disk
  • Filter rows ≫ 0 after Index Scan → over-fetching, index not selective enough

Good Signs

  • Index Only Scan → all columns in index, heap not touched
  • Bitmap Index Scan + Bitmap Heap Scan → efficient bulk access
  • Hash Join with Batches=1 → fits in memory
  • Estimated ≈ actual rows → planner has accurate stats
  • Planning time < 1ms → query is not overly complex
  • Execution time consistent → data fits in shared_buffers (Buffers: hit)

Use explain.dalibo.com to paste EXPLAIN (FORMAT JSON) output and get a visual, colour-coded plan tree. Far easier to navigate than text format for complex queries.

The N+1 Query Problem

🐛

N+1: The Silent Performance Killer

ANTI-PATTERN

N+1 occurs when you fetch a list of N items then issue a separate query for each item. The result is N+1 round trips to the database instead of 1 or 2.

-- ❌ N+1: fetch 100 users, then 1 query per user for their posts
users = SELECT * FROM users LIMIT 100;        -- 1 query
FOR u IN users:
    posts = SELECT * FROM posts WHERE user_id = u.id;  -- 100 queries
-- Total: 101 queries. On 50ms RTT: ~5 seconds.

-- ✅ Solution 1: JOIN
SELECT u.*, p.title, p.created_at
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
LIMIT 100;  -- 1 query

-- ✅ Solution 2: Batch load (IN clause)
user_ids = [1, 2, 3, ..., 100];
SELECT * FROM posts
WHERE user_id = ANY($1);  -- 2 total queries

JOIN Types

JOIN TypeReturnsUse when
INNER JOINRows with matching keys in both tablesBoth sides must exist. Most common.
LEFT JOINAll rows from left + matching from right (NULLs where no match)Optional relationship: user may or may not have a profile.
RIGHT JOINAll rows from right + matching from leftRare; rewrite as LEFT JOIN with tables swapped.
FULL OUTER JOINAll rows from both, NULLs where no matchReconciliation queries, finding unmatched rows on either side.
CROSS JOINCartesian product (M × N rows)Generating test data, pairing every item with every other. Dangerous on large tables.
LATERAL JOINSubquery that can reference outer query's columns"For each user, get their 3 most recent orders" — correlated subquery without N+1.

CTEs vs Subqueries

📝

Common Table Expressions

WITH CLAUSE
-- Subquery (inline, can be optimised away by planner)
SELECT u.name, o.total
FROM users u
JOIN (SELECT user_id, sum(amount) AS total
      FROM orders GROUP BY user_id) o
ON o.user_id = u.id;

-- CTE (WITH) — same query, more readable
WITH order_totals AS (
  SELECT user_id, sum(amount) AS total
  FROM orders
  GROUP BY user_id
)
SELECT u.name, ot.total
FROM users u
JOIN order_totals ot ON ot.user_id = u.id;

-- Recursive CTE: walk a tree/graph (org hierarchy, comments thread)
WITH RECURSIVE org_tree AS (
  SELECT id, name, manager_id, 0 AS depth
  FROM employees WHERE manager_id IS NULL  -- anchor
  UNION ALL
  SELECT e.id, e.name, e.manager_id, ot.depth + 1
  FROM employees e
  JOIN org_tree ot ON e.manager_id = ot.id    -- recursive
)
SELECT * FROM org_tree ORDER BY depth;

PostgreSQL ≥ 12: CTEs are inlined by default (treated as subqueries, planner can optimise through them). Use WITH ... AS MATERIALIZED to force the old behaviour — the CTE is executed once and cached, which can be faster when referenced multiple times.

</p>

Window Functions

🪟

Aggregates Without Collapsing Rows

OVER CLAUSE
-- Rank users by order count within each country
SELECT
  name,
  country,
  order_count,
  rank() OVER (
    PARTITION BY country
    ORDER BY order_count DESC
  ) AS country_rank
FROM users;

-- Running total (cumulative sum)
SELECT
  created_at::date,
  revenue,
  sum(revenue) OVER (ORDER BY created_at::date) AS running_total
FROM daily_sales;

-- Row number for cursor pagination (stable ordering)
SELECT *, row_number() OVER (ORDER BY id) AS rn
FROM events
WHERE id > :last_cursor
LIMIT 50;

-- Lag/Lead: compare row with previous/next
SELECT
  date,
  price,
  lag(price, 1) OVER (ORDER BY date) AS prev_price,
  price - lag(price, 1) OVER (ORDER BY date) AS change
FROM stock_prices;

UPSERT and Conflict Handling

-- Insert or update on conflict (upsert)
INSERT INTO user_stats (user_id, login_count, last_seen)
VALUES ($1, 1, now())
ON CONFLICT (user_id) DO UPDATE
  SET login_count = user_stats.login_count + 1,
      last_seen   = EXCLUDED.last_seen;  -- EXCLUDED = the rejected row

-- Insert and ignore duplicates
INSERT INTO events (id, payload)
VALUES ($1, $2)
ON CONFLICT (id) DO NOTHING;

-- Returning the affected rows (useful for getting auto-generated IDs)
INSERT INTO users (name, email)
VALUES ($1, $2)
RETURNING id, created_at;
📐

Normalization: Eliminating Redundancy

DATA MODELLING

Normalization is the process of structuring tables to minimise data redundancy and prevent update anomalies. Each normal form adds a constraint. In practice, design to 3NF/BCNF and deliberately denormalize specific hot paths with explicit justification.

Normal Forms

1NF

First Normal Form — Atomic Values

Rule: Each column holds a single, indivisible value. No repeating groups (multiple phone numbers in one column). Each row is uniquely identifiable (primary key exists).

Violation: users(id, name, phones="555-1234,555-5678") — comma-delimited phones in one column.

Fix: Separate user_phones(user_id, phone_number) table with one row per phone.

2NF

Second Normal Form — No Partial Dependencies

Rule: Must be 1NF. Every non-key column must depend on the whole primary key, not just part of it. Only relevant for tables with composite primary keys.

Violation: order_items(order_id, product_id, quantity, product_name)product_name depends only on product_id, not the full (order_id, product_id) key.

Fix: Move product_name to the products table. order_items keeps only quantity.

3NF

Third Normal Form — No Transitive Dependencies

Rule: Must be 2NF. No non-key column depends on another non-key column (transitive dependency).

Violation: employees(id, dept_id, dept_name, salary)dept_name depends on dept_id, not directly on id.

Fix: Extract departments(dept_id, dept_name). employees keeps only dept_id as a foreign key.

BCNF

Boyce-Codd Normal Form — Every Determinant Is a Candidate Key

Rule: Stricter than 3NF. For every functional dependency X → Y, X must be a superkey (uniquely identifies rows). Fixes edge cases in 3NF with overlapping composite candidate keys.

Violation: course_teachers(student, course, teacher) where a teacher teaches only one course but a course can have multiple teachers. The dependency teacher → course exists, but teacher is not a key.

Fix: Decompose into teacher_courses(teacher, course) and student_courses(student, course).

When to Denormalize

Deliberate Denormalization

TRADE-OFF

Normalization optimises for write integrity. Denormalization trades integrity for read performance. Denormalize only when you have a measured performance problem and accept the consistency maintenance burden.

PatternWhat it doesMaintenance cost
Cached countStore posts.comment_count alongside the post; increment/decrement via trigger or application logicMust update on every insert/delete to comments
Materialized viewPre-computed JOIN result stored as a table (CREATE MATERIALIZED VIEW); refreshed on schedule or triggerREFRESH MATERIALIZED VIEW CONCURRENTLY — doesn't block reads
Duplicated columnsCopy user.name into orders.user_name to avoid JOIN on every order listMust sync on user name change (rare)
JSONB for variable attributesStore flexible/sparse attributes in a JSONB column instead of an EAV tableNo schema enforcement; use GIN index for querying
🔒

ACID: Four Guarantees

CORRECTNESS

ACID is the contract a database makes about what happens to your data, even in the face of crashes, errors, and concurrent access. Understanding each property tells you what you can and cannot rely on.

Read Phenomena (What Can Go Wrong)

PhenomenonDescriptionExample
Dirty ReadRead uncommitted data from another transaction that may roll backT1 reads a balance modified by T2 before T2 commits. T2 rolls back — T1 read phantom data.
Non-repeatable ReadRe-reading the same row within a transaction returns different dataT1 reads price=10. T2 updates price=20 and commits. T1 re-reads: price=20. Same query, different result.
Phantom ReadRe-executing a range query returns different rowsT1 counts orders WHERE status='pending': 5 rows. T2 inserts a new pending order. T1 re-counts: 6 rows.
Serialisation AnomalyResult is inconsistent with any serial order of the transactionsWrite skew: two doctors both read "at least 1 on call", both go off-call — result: zero doctors on call.

Isolation Levels

Level Dirty Read Non-Repeatable Phantom Serialisation Anomaly PostgreSQL default?
Read Uncommitted Possible Possible Possible Possible No (PG treats as Read Committed)
Read Committed Not possible Possible Possible Possible ✅ Default
Repeatable Read Not possible Not possible Not possible* Possible No
Serializable Not possible Not possible Not possible Not possible No (highest cost)

*PostgreSQL's Repeatable Read uses snapshot isolation which prevents phantoms too — stronger than the SQL standard requires. PG's Serializable uses Serializable Snapshot Isolation (SSI), a lock-free algorithm that detects and aborts transactions that would create anomalies.

</p>

Deadlocks

💀

Deadlock Detection and Prevention

CONCURRENCY

A deadlock occurs when two (or more) transactions each hold a lock the other needs, creating a circular wait.

-- T1                              T2
BEGIN;                           BEGIN;
UPDATE accounts SET bal=bal-100   UPDATE accounts SET bal=bal+100
WHERE id=1;  -- locks row 1       WHERE id=2;  -- locks row 2
UPDATE accounts SET bal=bal+100   UPDATE accounts SET bal=bal-100
WHERE id=2;  -- waits for row 2   WHERE id=1;  -- waits for row 1 ← DEADLOCK

PostgreSQL detects deadlocks automatically (within deadlock_timeout, default 1s) and aborts one transaction with: ERROR: deadlock detected. The other continues normally.

Prevention Strategies

  • Consistent lock ordering: always lock resources in the same order (e.g., always lock lower ID first). Eliminates the circular wait condition.
  • SELECT FOR UPDATE SKIP LOCKED: skip rows locked by other transactions instead of waiting. Useful for job queues.
  • NOWAIT: fail immediately if lock cannot be acquired: SELECT ... FOR UPDATE NOWAIT → returns error, never waits.
  • Short transactions: the shorter a transaction, the shorter the lock hold time, the lower the probability of deadlock.

Advisory Locks

🔑

Application-Level Mutex via PostgreSQL

DISTRIBUTED LOCKING

Advisory locks are arbitrary integer locks managed by the application — PostgreSQL doesn't tie them to any table row. Useful for distributed mutual exclusion (e.g., ensuring only one instance runs a cron job).

-- Acquire exclusive advisory lock (blocks if held by another session)
SELECT pg_advisory_lock(12345);

-- Non-blocking try: returns true/false
SELECT pg_try_advisory_lock(12345);

-- Transaction-scoped (auto-released on commit/rollback)
SELECT pg_advisory_xact_lock(12345);

-- Release session-scoped lock explicitly
SELECT pg_advisory_unlock(12345);

Connecting to PostgreSQL with libpq

#include <libpq-fe.h>
#include <stdio.h>
#include <stdlib.h>

/* Compile: gcc db.c -lpq -o db
   Requires: libpq-dev package (apt install libpq-dev)  */

PGconn *db_connect(const char *connstr) {
    PGconn *conn = PQconnectdb(connstr);
    if (PQstatus(conn) != CONNECTION_OK) {
        fprintf(stderr, "Connection failed: %s\n", PQerrorMessage(conn));
        PQfinish(conn);
        return NULL;
    }
    return conn;
}

int main(void) {
    /* Connection string: can also use env vars PGHOST, PGPORT, PGUSER etc */
    PGconn *conn = db_connect(
        "host=localhost port=5432 dbname=myapp user=myuser password=secret"
    );
    if (!conn) return 1;

    printf("Connected to PostgreSQL %s\n", PQparameterStatus(conn, "server_version"));
    PQfinish(conn);
    return 0;
}

Prepared Statements (Parameterised Queries)

/* Always use prepared statements — never string-concatenate user input */

int get_user_by_id(PGconn *conn, int user_id) {
    /* Prepare once, execute many times */
    PGresult *prep = PQprepare(conn,
        "get_user",                         /* statement name */
        "SELECT id, name, email FROM users WHERE id = $1",
        1,                                   /* number of params */
        NULL                                 /* param types (NULL = infer) */
    );
    if (PQresultStatus(prep) != PGRES_COMMAND_OK) {
        fprintf(stderr, "Prepare failed: %s\n", PQresultErrorMessage(prep));
        PQclear(prep);
        return -1;
    }
    PQclear(prep);

    /* Execute: pass parameter as string (libpq converts) */
    char id_str[16];
    snprintf(id_str, sizeof(id_str), "%d", user_id);
    const char *params[] = { id_str };

    PGresult *res = PQexecPrepared(conn,
        "get_user",   /* statement name */
        1,             /* nParams */
        params,        /* paramValues */
        NULL,          /* paramLengths (NULL = text mode, use strlen) */
        NULL,          /* paramFormats (NULL = all text) */
        0              /* resultFormat: 0=text, 1=binary */
    );

    if (PQresultStatus(res) != PGRES_TUPLES_OK) {
        fprintf(stderr, "Query failed: %s\n", PQresultErrorMessage(res));
        PQclear(res);
        return -1;
    }

    int rows = PQntuples(res);
    int cols = PQnfields(res);
    printf("Got %d rows, %d cols\n", rows, cols);

    for (int r = 0; r < rows; r++) {
        for (int c = 0; c < cols; c++) {
            printf("  %s = %s\n",
                   PQfname(res, c),
                   PQgetisnull(res, r, c) ? "(null)" : PQgetvalue(res, r, c));
        }
    }

    PQclear(res);
    return rows;
}

Transactions in C

int transfer_funds(PGconn *conn, int from_id, int to_id, int amount_cents) {
    /* BEGIN */
    PGresult *r = PQexec(conn, "BEGIN");
    if (PQresultStatus(r) != PGRES_COMMAND_OK) { PQclear(r); return -1; }
    PQclear(r);

    /* Debit — lock the row first with FOR UPDATE */
    char from_str[16], to_str[16], amt_str[16];
    snprintf(from_str, sizeof(from_str), "%d", from_id);
    snprintf(to_str,   sizeof(to_str),   "%d", to_id);
    snprintf(amt_str,  sizeof(amt_str),  "%d", amount_cents);

    const char *debit_params[] = { amt_str, from_str };
    r = PQexecParams(conn,
        "UPDATE accounts SET balance = balance - $1 WHERE id = $2 AND balance >= $1",
        2, NULL, debit_params, NULL, NULL, 0);

    if (PQresultStatus(r) != PGRES_COMMAND_OK ||
        strcmp(PQcmdTuples(r), "1") != 0) {  /* 0 rows = insufficient balance */
        PQclear(r);
        PQexec(conn, "ROLLBACK");
        return -1;
    }
    PQclear(r);

    /* Credit */
    const char *credit_params[] = { amt_str, to_str };
    r = PQexecParams(conn,
        "UPDATE accounts SET balance = balance + $1 WHERE id = $2",
        2, NULL, credit_params, NULL, NULL, 0);

    if (PQresultStatus(r) != PGRES_COMMAND_OK) {
        PQclear(r);
        PQexec(conn, "ROLLBACK");
        return -1;
    }
    PQclear(r);

    /* COMMIT */
    r = PQexec(conn, "COMMIT");
    int ok = PQresultStatus(r) == PGRES_COMMAND_OK;
    PQclear(r);
    return ok ? 0 : -1;
}

Minimal Connection Pool in C

/* Production uses PgBouncer. This illustrates the concept. */

#define POOL_SIZE 8

typedef struct {
    PGconn  *conn;
    int      in_use;
    pthread_mutex_t lock;
} conn_slot_t;

static conn_slot_t pool[POOL_SIZE];
static pthread_mutex_t pool_lock = PTHREAD_MUTEX_INITIALIZER;

void pool_init(const char *connstr) {
    for (int i = 0; i < POOL_SIZE; i++) {
        pool[i].conn   = PQconnectdb(connstr);
        pool[i].in_use = 0;
        pthread_mutex_init(&pool[i].lock, NULL);
    }
}

PGconn *pool_acquire(void) {
    pthread_mutex_lock(&pool_lock);
    for (int i = 0; i < POOL_SIZE; i++) {
        if (!pool[i].in_use && PQstatus(pool[i].conn) == CONNECTION_OK) {
            pool[i].in_use = 1;
            pthread_mutex_unlock(&pool_lock);
            return pool[i].conn;
        }
    }
    pthread_mutex_unlock(&pool_lock);
    return NULL;  /* pool exhausted */
}

void pool_release(PGconn *conn) {
    pthread_mutex_lock(&pool_lock);
    for (int i = 0; i < POOL_SIZE; i++) {
        if (pool[i].conn == conn) {
            pool[i].in_use = 0;
            break;
        }
    }
    pthread_mutex_unlock(&pool_lock);
}

In production: use PgBouncer (transaction-mode pooling) in front of PostgreSQL. It handles thousands of client connections multiplexed over a small number of server connections. Never create a new PGconn per request in a high-throughput server.

🔬 Lab 1 — Index Experiments on Real Data

TOOLS: PostgreSQL · psql · pgbench

Goal: Generate a large dataset and observe the before/after performance impact of indexes using EXPLAIN ANALYZE.

1Create a test database and generate 1M rows:
createdb perflab
psql perflab -c "CREATE TABLE orders (id serial PRIMARY KEY, user_id int, status text, amount int, created_at timestamptz DEFAULT now());"
psql perflab -c "INSERT INTO orders (user_id, status, amount) SELECT (random()*10000)::int, (ARRAY['pending','complete','cancelled'])[ceil(random()*3)::int], (random()*10000)::int FROM generate_series(1,1000000);"
2Run a query without indexes and capture the plan:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';
Note: execution time, "Seq Scan", rows removed by filter.
3Add an index and compare:
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
Re-run the EXPLAIN. Note: execution time, node type change (Index Scan or Bitmap Index Scan), rows fetched.
4Try the reverse column order: CREATE INDEX idx_orders_status_user ON orders(status, user_id);. Run WHERE status = 'pending' AND user_id = 42. Which index does the planner choose? Why?
5Test a partial index: CREATE INDEX idx_orders_pending ON orders(user_id) WHERE status = 'pending';. Compare plan for pending-only query vs a query across all statuses.
6Find unused indexes: SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE schemaname='public' ORDER BY idx_scan;. Drop any with 0 scans.

🔬 Lab 2 — Isolation Levels in Action

TOOLS: PostgreSQL · two psql sessions

Goal: Observe the difference between Read Committed and Repeatable Read isolation levels by running concurrent transactions in two terminal sessions.

1Set up: CREATE TABLE balances (id int PRIMARY KEY, amount int); INSERT INTO balances VALUES (1, 1000), (2, 500);
2Non-repeatable read demo (Read Committed):
Session A: BEGIN; SELECT amount FROM balances WHERE id=1; → 1000
Session B: UPDATE balances SET amount=2000 WHERE id=1; COMMIT;
Session A: SELECT amount FROM balances WHERE id=1; → 2000 (changed!)
Session A: ROLLBACK;
3Repeatable Read prevents this:
Session A: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT amount FROM balances WHERE id=1; → 1000
Session B: UPDATE balances SET amount=3000 WHERE id=1; COMMIT;
Session A: SELECT amount FROM balances WHERE id=1; → still 1000 (snapshot!)
Session A: COMMIT;
4Deadlock simulation:
Session A: BEGIN; UPDATE balances SET amount=amount-100 WHERE id=1;
Session B: BEGIN; UPDATE balances SET amount=amount-100 WHERE id=2;
Session A: UPDATE balances SET amount=amount+100 WHERE id=2; → waits
Session B: UPDATE balances SET amount=amount+100 WHERE id=1; → deadlock! One session gets: ERROR: deadlock detected
5Fix the deadlock: always update rows in a consistent order (lower id first). Rewrite both transactions to update id=1 before id=2. Verify no deadlock occurs.

🔬 Lab 3 — libpq CRUD Application in C

TOOLS: gcc · libpq-dev · PostgreSQL

Goal: Build a command-line user management tool using libpq with prepared statements and transactions.

1Create the schema: CREATE TABLE users (id serial PRIMARY KEY, name text NOT NULL, email text UNIQUE NOT NULL, created_at timestamptz DEFAULT now());
2Write users.c using the libpq code from the Implementation tab. Implement: user_create(conn, name, email) → returns new id, user_find(conn, id) → prints user, user_list(conn) → prints all users.
3Use PQprepare for all queries. Verify that SQL injection is not possible: try passing name = "'; DROP TABLE users; --" as the name parameter. The prepared statement should store it literally.
4Implement user_transfer_email(conn, from_id, to_id) which copies the email from one user to another in a single transaction, verifying both users exist. Use BEGIN / COMMIT / ROLLBACK pattern.
5Add error handling: check PQresultStatus on every result. Handle PGRES_FATAL_ERROR (constraint violations like duplicate email) gracefully — print the error message, don't crash.
6Stretch: Implement the minimal connection pool from the Implementation tab. Run the tool from multiple threads simultaneously using pthread_create. Verify correctness under concurrent inserts.

🔬 Lab 4 — Normalization Design Exercise

TOOLS: pen & paper · psql

Goal: Take a denormalized flat table and normalize it to 3NF, then measure query performance before and after.

1Start with this flat table: CREATE TABLE flat_orders (order_id int, order_date date, customer_id int, customer_name text, customer_email text, customer_city text, product_id int, product_name text, product_category text, quantity int, unit_price numeric);
2Identify all functional dependencies. Which columns depend on order_id? On customer_id? On product_id? Map the violations of 2NF and 3NF.
3Design the normalized schema: customers, products, orders, order_items. Write the CREATE TABLE statements with appropriate primary keys and foreign keys.
4Generate 100k rows in the flat table and in the normalized schema. Write a JOIN query that produces the same output as SELECT * FROM flat_orders. Compare execution plans and times.
5Deliberate denormalization: Add a total_amount column to orders (computed as SUM of quantity × unit_price). Update it via a trigger. Measure query time for "get orders with total > 1000" with vs without the denormalized column.

Module Mastery Checklist

M06 COMPLETE

You have mastered this module when you can check off every item below without referring to notes.

SQL Internals

Indexes

EXPLAIN

Query Patterns

Normalization

ACID & Isolation

C / libpq


Next in Phase 2: M07 covers Transactions & MVCC in depth (savepoints, advisory locks, SKIP LOCKED job queues) and M08 covers NoSQL & Redis (when to reach for a non-relational store and how to use it alongside PostgreSQL).

← M03 REST Design Phase 2 · Module 6 of 3 ↑ Roadmap