P1-M04 - SQL Basics & FastAPI

Part 1 — Universal Foundation · Module 04 of 04

SQL Basics & FastAPI

Query databases and build your first production API server

⏱ 2 Weeks 🟡 Beginner–Intermediate 🗄 SQLite · FastAPI · Pydantic 📋 Prerequisite: P1-M01, P1-M03

🎯

What This Module Covers

Final Foundation Module

This is the final module of Part 1 — and the payoff. SQL and FastAPI are the two tools you use to store data and expose it as an API. Every AI application you build — RAG pipelines, agent backends, model serving — will use both. By the end of this module you will have built and run your first real API server.

SQL — SELECT, WHERE, GROUP BY, JOIN, ORDER BY, aggregations with SQLite
Pandas + SQL — reading SQL query results directly into DataFrames
FastAPI basics — GET and POST endpoints, path parameters, query parameters, request bodies
Pydantic — data validation and schema definition with type hints
Running a server — uvicorn dev server, the /docs interface, testing endpoints
FastAPI + SQLite — connecting a database to your API for persistent storage

⚡ SKIP IF: You know SQL from databases experience — jump directly to the FastAPI tab (Tab 3). If you know Spring Boot or Express.js, FastAPI will feel immediately familiar; spend your time on Pydantic validation and async endpoints, which are FastAPI-specific.

🔗

Where These Skills Lead

Forward Connections

P1-M04 → P4 (LLM APIs) — FastAPI is how you expose LLM endpoints. Pydantic models are how you define structured output schemas.
P1-M04 → P5 (RAG) — pgvector extends PostgreSQL with vector search. SQLite stores document metadata alongside embeddings.
P1-M04 → P7 (Production) — production FastAPI uses async DB sessions, connection pooling, and all the patterns you learn here.
SQL fluency — used for log analysis, querying ML experiment results in MLflow, and reading data from feature stores.

🗄

SQL Mental Model

Concept First

SQL (Structured Query Language) is a declarative language — you describe what data you want, not how to fetch it. The database engine figures out the how. Think of a SQL table like a Pandas DataFrame: rows are records, columns are fields, and SQL is the query language.

-- SQL is not case-sensitive for keywords, but convention is UPPERCASE
-- Single-line comment: --
-- Multi-line: /* ... */

-- Create a table
CREATE TABLE students (
    id      INTEGER PRIMARY KEY AUTOINCREMENT,
    name    TEXT    NOT NULL,
    score   REAL    DEFAULT 0.0,
    grade   TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Insert rows
INSERT INTO students (name, score, grade) VALUES ('Alice', 92.5, 'A');
INSERT INTO students (name, score, grade) VALUES ('Bob',   78.0, 'C');
INSERT INTO students (name, score, grade) VALUES ('Charlie',85.5, 'B');

🔍

SELECT — The Core Query

Most Used

-- Basic SELECT
SELECT * FROM students;                     -- all columns, all rows
SELECT name, score FROM students;           -- specific columns
SELECT DISTINCT grade FROM students;        -- unique values only

-- WHERE — filter rows
SELECT * FROM students WHERE score > 80;
SELECT * FROM students WHERE grade = 'A' AND score >= 90;
SELECT * FROM students WHERE grade IN ('A', 'B');
SELECT * FROM students WHERE name LIKE 'A%';   -- starts with A
SELECT * FROM students WHERE score BETWEEN 70 AND 90;
SELECT * FROM students WHERE grade IS NULL;    -- NULL check

-- ORDER BY — sort results
SELECT * FROM students ORDER BY score DESC;        -- highest first
SELECT * FROM students ORDER BY grade ASC, score DESC;  -- multi-column

-- LIMIT and OFFSET — pagination
SELECT * FROM students ORDER BY score DESC LIMIT 10;          -- top 10
SELECT * FROM students ORDER BY score DESC LIMIT 10 OFFSET 20; -- page 3

-- Computed columns and aliases
SELECT name,
       score,
       score * 0.1  AS bonus_points,
       UPPER(name)  AS name_upper
FROM students;

📊

Aggregations — COUNT, SUM, AVG, MIN, MAX

Analytics

-- Aggregate functions
SELECT COUNT(*)          AS total_students FROM students;
SELECT COUNT(grade)      AS graded_count   FROM students;  -- skips NULLs
SELECT AVG(score)        AS class_average  FROM students;
SELECT MAX(score)        AS highest        FROM students;
SELECT MIN(score)        AS lowest         FROM students;
SELECT SUM(score)        AS total_points   FROM students;

-- GROUP BY — aggregate per group
SELECT grade,
       COUNT(*)    AS student_count,
       AVG(score)  AS avg_score,
       MAX(score)  AS top_score
FROM students
GROUP BY grade
ORDER BY avg_score DESC;

-- HAVING — filter AFTER grouping (WHERE filters before)
SELECT grade, AVG(score) AS avg_score
FROM students
GROUP BY grade
HAVING AVG(score) > 80;    -- only grades with class avg > 80

-- Rule: WHERE filters rows BEFORE grouping
--       HAVING filters groups AFTER aggregation

💡 SQL execution order (not the same as write order): FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. Understanding this prevents "column not found in WHERE" errors when using aliases.

🐍

SQL from Python — sqlite3 and Pandas

Integration

import sqlite3
import pandas as pd

# ── sqlite3 — standard library, no install needed ──
conn = sqlite3.connect("students.db")    # creates file if not exists
cursor = conn.cursor()

# Execute SQL
cursor.execute("""
    CREATE TABLE IF NOT EXISTS students (
        id    INTEGER PRIMARY KEY AUTOINCREMENT,
        name  TEXT    NOT NULL,
        score REAL
    )
""")
conn.commit()

# Insert with parameterised query (NEVER use f-strings for SQL!)
cursor.execute("INSERT INTO students (name, score) VALUES (?, ?)",
               ("Alice", 92.5))
conn.commit()

# Bulk insert
students = [("Bob", 78), ("Charlie", 85), ("Diana", 91)]
cursor.executemany("INSERT INTO students (name, score) VALUES (?, ?)", students)
conn.commit()

# Query results → Python list of tuples
cursor.execute("SELECT * FROM students WHERE score > 80 ORDER BY score DESC")
rows = cursor.fetchall()
for row in rows:
    print(row)   # (1, "Alice", 92.5)

# Query results → Pandas DataFrame (most useful pattern)
df = pd.read_sql_query(
    "SELECT name, score FROM students ORDER BY score DESC",
    conn
)
print(df.head())

conn.close()   # always close when done

⚠️ Never use string formatting or f-strings to build SQL queries. f"SELECT * FROM users WHERE name = '{user_input}'" is a SQL injection vulnerability. Always use parameterised queries with ? placeholders. This is the most critical SQL security rule.

🔗

JOINs — Combining Tables

Essential

JOINs combine rows from two tables based on a related column. The same four join types exist in Pandas merge — understanding them once applies to both.

-- Sample tables
-- students: id, name, score, dept_id
-- departments: id, name, building

-- INNER JOIN — only rows that match in BOTH tables
SELECT s.name, s.score, d.name AS department
FROM   students     s
JOIN   departments  d ON s.dept_id = d.id;

-- LEFT JOIN — all students, even those with no department
SELECT s.name, s.score, d.name AS department
FROM   students     s
LEFT JOIN departments d ON s.dept_id = d.id;
-- d.name will be NULL for students with no matching dept_id

-- RIGHT JOIN (SQLite doesn't support — use LEFT JOIN with tables swapped)

-- Self-join — join a table to itself
-- Find all students who scored higher than Alice
SELECT b.name, b.score
FROM   students a
JOIN   students b ON b.score > a.score
WHERE  a.name = 'Alice';

📊

JOIN Type Reference

Quick Lookup

Join Type	Returns	Pandas Equivalent
INNER JOIN	Only rows matching in both tables	`how="inner"`
LEFT JOIN	All left rows + matched right rows (NULL if no match)	`how="left"`
RIGHT JOIN	All right rows + matched left rows	`how="right"`
FULL OUTER	All rows from both, NULL where no match	`how="outer"`

📈

Subqueries and CTEs

Advanced Patterns

-- Subquery in WHERE — students above class average
SELECT name, score
FROM   students
WHERE  score > (SELECT AVG(score) FROM students);

-- Subquery in FROM — treat query result as a table
SELECT grade, avg_score
FROM (
    SELECT grade, AVG(score) AS avg_score
    FROM   students
    GROUP  BY grade
) AS grade_stats
WHERE avg_score > 75;

-- CTE (Common Table Expression) — readable named subquery
WITH above_avg AS (
    SELECT name, score
    FROM   students
    WHERE  score > (SELECT AVG(score) FROM students)
),
top_dept AS (
    SELECT dept_id, COUNT(*) AS count
    FROM   above_avg a
    JOIN   students s ON a.name = s.name
    GROUP  BY dept_id
    ORDER  BY count DESC
    LIMIT  1
)
SELECT d.name AS top_department
FROM   departments d
JOIN   top_dept t ON d.id = t.dept_id;

💡 Use CTEs over nested subqueries whenever possible. CTEs are named, reusable, and read top-to-bottom like a story. Deeply nested subqueries become impossible to maintain. The WITH name AS (...) pattern is the professional SQL standard.

🔧

UPDATE, DELETE and Indexes

Data Management

-- UPDATE — modify existing rows
UPDATE students SET grade = 'A' WHERE score >= 90;
UPDATE students SET score = score * 1.05 WHERE grade = 'B';  -- 5% bonus

-- DELETE — remove rows
DELETE FROM students WHERE score < 40;
DELETE FROM students WHERE name = 'Bob';

-- TRUNCATE equivalent in SQLite
DELETE FROM students;   -- removes all rows, table structure remains

-- Indexes — speed up queries on large tables
CREATE INDEX idx_students_score ON students(score);
CREATE INDEX idx_students_grade ON students(grade);
CREATE UNIQUE INDEX idx_students_email ON students(email);

-- When to create an index:
-- Columns frequently used in WHERE, ORDER BY, or JOIN conditions
-- Foreign key columns
-- High-cardinality columns (many unique values)
-- NOT on columns with very few unique values (e.g. boolean flag)

-- Check query plan (does it use the index?)
EXPLAIN QUERY PLAN SELECT * FROM students WHERE score > 80;

⚡

FastAPI — Why It Is the Standard for AI APIs

Context

FastAPI is the dominant Python framework for building AI APIs. It is fast (ASGI, async-first), automatically generates interactive docs, and uses Pydantic for validation — the same library used by LangChain, OpenAI SDK, and Anthropic SDK under the hood.

pip install fastapi uvicorn[standard]

# Minimal FastAPI app — save as main.py
from fastapi import FastAPI

app = FastAPI(title="My AI API", version="1.0.0")

@app.get("/")
def root():
    return {"message": "AI API is running"}

@app.get("/health")
def health():
    return {"status": "ok"}

# Run the server
# uvicorn main:app --reload
# Open http://127.0.0.1:8000/docs  ← interactive Swagger UI
# Open http://127.0.0.1:8000/redoc ← alternative docs

💡 The /docs endpoint is one of FastAPI's killer features. It auto-generates an interactive Swagger UI from your code — you can test every endpoint directly in the browser without writing a client or using curl. Use it constantly while developing.

🛤

Path Parameters, Query Parameters and Request Bodies

Core Patterns

from fastapi import FastAPI, HTTPException, Query
from pydantic import BaseModel
from typing import Optional

app = FastAPI()

# PATH PARAMETER — part of the URL path
# GET /students/42
@app.get("/students/{student_id}")
def get_student(student_id: int):   # FastAPI validates type automatically
    return {"student_id": student_id}

# QUERY PARAMETER — after the ? in URL
# GET /students?min_score=80&limit=10
@app.get("/students")
def list_students(
    min_score: float = 0.0,                   # optional with default
    limit: int = Query(default=20, le=100),   # with constraint: max 100
    grade: Optional[str] = None               # truly optional
):
    return {"min_score": min_score, "limit": limit, "grade": grade}

# REQUEST BODY — JSON in POST/PUT body
class CreateStudentRequest(BaseModel):
    name:  str
    score: float
    grade: Optional[str] = None

# POST /students
@app.post("/students", status_code=201)
def create_student(student: CreateStudentRequest):
    # FastAPI auto-parses JSON body into the Pydantic model
    # Validation happens automatically — wrong types return 422
    return {"created": student.model_dump()}

🚨

Error Handling and HTTP Exceptions

Production Pattern

from fastapi import FastAPI, HTTPException, status

app = FastAPI()

STUDENTS_DB = {1: {"name": "Alice", "score": 92}}

@app.get("/students/{student_id}")
def get_student(student_id: int):
    student = STUDENTS_DB.get(student_id)
    if not student:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail=f"Student {student_id} not found"
        )
    return student

# Custom exception handler
from fastapi import Request
from fastapi.responses import JSONResponse

@app.exception_handler(ValueError)
async def value_error_handler(request: Request, exc: ValueError):
    return JSONResponse(
        status_code=400,
        content={"error": "validation_error", "detail": str(exc)}
    )

# Health check endpoint — essential for production
@app.get("/health", tags=["monitoring"])
def health_check():
    return {
        "status": "healthy",
        "version": app.version,
    }

🔄

Async Endpoints and Background Tasks

AI-Specific Pattern

from fastapi import FastAPI, BackgroundTasks
import asyncio, anthropic

app = FastAPI()
client = anthropic.AsyncAnthropic()

# Async endpoint — non-blocking LLM call
@app.post("/chat")
async def chat(message: str):
    response = await client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": message}]
    )
    return {"reply": response.content[0].text}

# Streaming endpoint — sends tokens as they arrive
from fastapi.responses import StreamingResponse

@app.post("/chat/stream")
async def chat_stream(message: str):
    async def generate():
        async with client.messages.stream(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            messages=[{"role": "user", "content": message}]
        ) as stream:
            async for text in stream.text_stream:
                yield text
    return StreamingResponse(generate(), media_type="text/plain")

# Background task — fire and forget
def log_request(message: str):
    with open("requests.log", "a") as f:
        f.write(f"{message}\n")

@app.post("/chat/logged")
async def chat_logged(message: str, background_tasks: BackgroundTasks):
    background_tasks.add_task(log_request, message)  # runs after response
    return {"reply": "Processing..."}

📐

Pydantic — Python's Data Validation Library

Critical for AI

Pydantic is used everywhere in the AI ecosystem — FastAPI, LangChain, OpenAI SDK, Anthropic SDK, and the Instructor library for structured LLM outputs. Learning it here pays dividends in every future module.

from pydantic import BaseModel, Field, field_validator
from typing import Optional, List
from datetime import datetime

# Basic model — define schema with type annotations
class Student(BaseModel):
    name:  str
    score: float
    grade: Optional[str] = None

s = Student(name="Alice", score=92.5)
print(s.model_dump())          # {"name":"Alice","score":92.5,"grade":null}
print(s.model_dump_json())     # JSON string

# Validation — Pydantic raises ValidationError on wrong types
try:
    bad = Student(name="Bob", score="not-a-number")
except Exception as e:
    print(e)   # score: Input should be a valid number

# Field — add constraints and documentation
class LLMRequest(BaseModel):
    model:       str   = Field(default="claude-3-5-sonnet-20241022")
    prompt:      str   = Field(min_length=1, max_length=100000)
    max_tokens:  int   = Field(default=1024, ge=1, le=8192)
    temperature: float = Field(default=0.7, ge=0.0, le=2.0)
    tags: List[str]    = Field(default_factory=list)

# Custom validator
class RegistrationForm(BaseModel):
    username: str
    email:    str
    age:      int

    @field_validator('email')
    @classmethod
    def email_must_contain_at(cls, v: str) -> str:
        if '@' not in v:
            raise ValueError('must be a valid email address')
        return v.lower()

    @field_validator('age')
    @classmethod
    def age_must_be_adult(cls, v: int) -> int:
        if v < 18:
            raise ValueError('must be 18 or older')
        return v

🤖

Pydantic for LLM Structured Outputs — Preview

Part 4 Preview

In Part 4 you will use Pydantic models to force LLMs to return structured JSON. Here is a preview of what that looks like with the Instructor library.

from pydantic import BaseModel
from typing import List
import instructor, anthropic

# Define the structure you want the LLM to return
class InvoiceLineItem(BaseModel):
    description: str
    quantity:    int
    unit_price:  float
    total:       float

class Invoice(BaseModel):
    invoice_number: str
    customer_name:  str
    line_items:     List[InvoiceLineItem]
    subtotal:       float
    tax:            float
    total:          float

# Instructor patches the client to enforce the schema
client = instructor.from_anthropic(anthropic.Anthropic())

# The LLM MUST return data matching the Invoice schema
invoice = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Extract invoice data: INV-001, Alice Corp, 3×widgets $45, 1×service $100, tax 10%"
    }],
    response_model=Invoice,   # ← Pydantic model as schema
)
print(invoice.total)   # 214.5  — a proper float, not a string

💡 This is why Pydantic matters for AI engineering. Without it, you get raw text back and must parse it manually — fragile and error-prone. With Pydantic + Instructor, you get a validated Python object with the exact structure you defined. This pattern is used in every serious AI application.

🔄

Connecting FastAPI to SQLite

Full Stack

Connecting a database to your API is the final step before you have a complete backend. This is the pattern used in production AI apps for storing conversation history, user data, and ML metadata.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional, List
import sqlite3
from contextlib import contextmanager

app = FastAPI(title="Student API")

# Database connection context manager
@contextmanager
def get_db():
    conn = sqlite3.connect("students.db")
    conn.row_factory = sqlite3.Row   # access cols by name: row["name"]
    try:
        yield conn
    finally:
        conn.close()

# Create table on startup
@app.on_event("startup")
def startup():
    with get_db() as conn:
        conn.execute("""
            CREATE TABLE IF NOT EXISTS students (
                id    INTEGER PRIMARY KEY AUTOINCREMENT,
                name  TEXT    NOT NULL,
                score REAL    NOT NULL,
                grade TEXT
            )
        """)
        conn.commit()

# Pydantic models
class StudentCreate(BaseModel):
    name:  str
    score: float
    grade: Optional[str] = None

class StudentResponse(BaseModel):
    id:    int
    name:  str
    score: float
    grade: Optional[str]

# POST /students — create
@app.post("/students", response_model=StudentResponse, status_code=201)
def create_student(student: StudentCreate):
    with get_db() as conn:
        cursor = conn.execute(
            "INSERT INTO students (name, score, grade) VALUES (?, ?, ?)",
            (student.name, student.score, student.grade)
        )
        conn.commit()
        row = conn.execute(
            "SELECT * FROM students WHERE id = ?", (cursor.lastrowid,)
        ).fetchone()
    return dict(row)

# GET /students — list all
@app.get("/students", response_model=List[StudentResponse])
def list_students(min_score: float = 0.0):
    with get_db() as conn:
        rows = conn.execute(
            "SELECT * FROM students WHERE score >= ? ORDER BY score DESC",
            (min_score,)
        ).fetchall()
    return [dict(row) for row in rows]

# GET /students/{id} — get one
@app.get("/students/{student_id}", response_model=StudentResponse)
def get_student(student_id: int):
    with get_db() as conn:
        row = conn.execute(
            "SELECT * FROM students WHERE id = ?", (student_id,)
        ).fetchone()
    if not row:
        raise HTTPException(status_code=404, detail="Student not found")
    return dict(row)

# DELETE /students/{id}
@app.delete("/students/{student_id}", status_code=204)
def delete_student(student_id: int):
    with get_db() as conn:
        result = conn.execute(
            "DELETE FROM students WHERE id = ?", (student_id,)
        )
        conn.commit()
    if result.rowcount == 0:
        raise HTTPException(status_code=404, detail="Student not found")

🧪

Testing Your API

Essential Skill

# Option 1 — FastAPI /docs (Swagger UI)
# Go to http://127.0.0.1:8000/docs in browser
# Click any endpoint → "Try it out" → fill fields → Execute

# Option 2 — curl from terminal
# Create a student
curl -X POST http://localhost:8000/students   -H "Content-Type: application/json"   -d '{"name": "Alice", "score": 92.5, "grade": "A"}'

# Get all students with score > 80
curl "http://localhost:8000/students?min_score=80"

# Get specific student
curl http://localhost:8000/students/1

# Option 3 — Python test client (best for automated tests)
from fastapi.testclient import TestClient
from main import app   # import your FastAPI app

client = TestClient(app)

def test_create_student():
    response = client.post("/students",
        json={"name": "Alice", "score": 92.5})
    assert response.status_code == 201
    assert response.json()["name"] == "Alice"

def test_get_missing_student():
    response = client.get("/students/999")
    assert response.status_code == 404

2-WEEK STRUCTURED PLAN

Week	Topics	Daily Task / Mini-Project
Week 1 SQL	Install SQLite (built into Python). CREATE TABLE, INSERT, SELECT with WHERE, ORDER BY, LIMIT. Aggregate functions: COUNT, AVG, MAX, MIN, SUM. GROUP BY and HAVING. INNER JOIN and LEFT JOIN. Subqueries and CTEs. UPDATE and DELETE. Parameterised queries — never SQL injection. pd.read_sql_query() to load results into DataFrames.	Day 1–2: SQLBolt interactive exercises — complete all 18 lessons. Day 3: Build the students.db schema from scratch, insert 20 rows, write 5 query exercises. Day 4–5: Answer these from the DB: top 3 students per grade, class average per department, students above class average. Day 6–7: Load your COVID-19 CSV from M02 into SQLite and reproduce the top-10 query using SQL instead of Pandas groupby.
Week 2 FastAPI	Install FastAPI and uvicorn. First endpoint, /docs interface. Path parameters, query parameters, request bodies with Pydantic. HTTP status codes and HTTPException. Async endpoints. BackgroundTasks. Response models. Connecting SQLite to FastAPI — full CRUD. Testing with TestClient.	Day 1–2: Build and run the minimal FastAPI app. Add health, version, and echo endpoints. Test via /docs. Day 3–4: Add Pydantic request and response models. Test validation — try sending wrong types and observe the 422 error. Day 5–7: Full milestone project — Student CRUD API with SQLite backend (see Projects tab).

FREE LEARNING RESOURCES

Type	Resource	Best For
Interactive	SQLBolt — sqlbolt.com — 20 short SQL lessons with in-browser exercises	Fastest way to learn SQL from scratch. Complete all lessons in Day 1.
Docs	FastAPI Official Tutorial — fastapi.tiangolo.com/tutorial/	One of the best framework docs ever written. Work through start to finish.
Course	Kaggle Intro to SQL (Free) — kaggle.com/learn/intro-to-sql	SQL with real BigQuery datasets. Hands-on with immediate feedback.
Video	FastAPI Full Tutorial — Sebastián Ramírez (YouTube)	FastAPI from the creator. Comprehensive walkthrough of all features.
Docs	Pydantic v2 Documentation — docs.pydantic.dev	Complete reference for validation, field constraints, and custom validators.

MILESTONE PROJECT

🛠 Student CRUD API — FastAPI + SQLite [Intermediate] 3–4 days · Week 2

Build a complete REST API for managing student data — your first real backend application. This is the template pattern for every AI API you will build in Parts 4–7.

Requirements

POST /students — create a student (name, score, grade, email). Validate: score 0–100, email format, name min 2 chars.
GET /students — list all students. Query params: min_score, max_score, grade filter, limit (max 100), offset (for pagination).
GET /students/{id} — get one student by ID. Return 404 if not found.
PUT /students/{id} — update a student. Partial update — only fields provided are updated.
DELETE /students/{id} — delete a student. Return 404 if not found.
GET /students/stats — return: total count, class average, grade distribution (A/B/C/D/F count), top student.
All endpoints have Pydantic request and response models
SQLite database persists between server restarts
Test all endpoints via /docs interface

Stretch Goals

Add a GET /students/export endpoint that returns a downloadable CSV of all students
Add a POST /students/bulk endpoint that accepts a list of students and inserts them in one transaction
Write pytest tests for all 6 endpoints using FastAPI TestClient
Add a created_at timestamp to each student record

Skills: FastAPI, Pydantic, SQLite, CRUD operations, HTTP status codes, query parameters, pagination, error handling

MINI-PROJECTS

🛠SQL Analytics on COVID Data1–2 days · Week 1

Load your M02 COVID-19 CSV into SQLite using df.to_sql(). Then reproduce all 5 analyses from M02 (top 10 by deaths, rolling average, monthly groupby) using pure SQL queries instead of Pandas. Compare the SQL and Pandas approaches — when is each cleaner?

LAB 1

SQL — Write 10 Queries Against a Real Dataset

Objective: Develop SQL fluency by writing non-trivial queries against a real dataset — not toy examples.

Download the Titanic CSV and load it into SQLite:

import pandas as pd, sqlite3; df = pd.read_csv("titanic.csv"); conn = sqlite3.connect("titanic.db"); df.to_sql("passengers", conn, if_exists="replace", index=False)

Write and run these queries using Python sqlite3: (1) Total passengers and survival rate. (2) Survival rate by Sex. (3) Survival rate by Pclass. (4) Average fare by Pclass. (5) Top 5 passengers by fare paid.

Write these harder queries: (6) Passengers whose fare was above the average fare for their class (correlated subquery). (7) Count of survivors per Embarked port. (8) The youngest and oldest survivor in each Pclass. (9) Passengers who traveled alone (SibSp=0 AND Parch=0) — their survival rate. (10) Use a CTE to find the survival rate for each cabin letter prefix.

For each query, also write the equivalent Pandas code. Compare line count and readability. Note which problems feel more natural in SQL vs Pandas.

LAB 2

FastAPI — Build and Stress-Test an Endpoint

Objective: Build a real FastAPI endpoint from scratch, test every failure mode, and observe how FastAPI handles errors.

Create a new FastAPI app with a single POST /analyse endpoint. It accepts: {"text": "...", "max_words": 100} and returns: {"word_count": N, "unique_words": N, "most_common": [...]}

Add Pydantic validation: text must be non-empty string, max_words must be between 10 and 1000. Run the server and test via /docs.

Now test every failure mode via /docs or curl: (a) Missing required field. (b) Wrong type for max_words (send a string). (c) max_words = 0 (violates constraint). (d) Empty text string. Note the exact error structure FastAPI returns for each.

Add a GET /analyse/history endpoint that returns the last 10 requests processed (store them in an in-memory list). This tests that you understand how FastAPI handles state between requests.

Write 4 TestClient tests: happy path, missing field, invalid type, and constraint violation. Run with python -m pytest test_api.py -v.

LAB 3

Full Stack — FastAPI + SQLite Todo API

Objective: Build the classic Todo API in 60 minutes — internalising the full CRUD + database pattern that every AI backend uses.

Create a Todo Pydantic model with: id (int), title (str, min 3 chars), completed (bool, default False), created_at (str). Create a corresponding SQLite table on startup.

Implement: POST /todos (create), GET /todos (list, filter by completed=true/false), GET /todos/{id} (get one), PATCH /todos/{id}/complete (mark done), DELETE /todos/{id} (delete).

Test the entire flow via /docs: create 5 todos, complete 2, list all, list only incomplete, delete one, get a non-existent todo (expect 404).

Add a GET /todos/stats endpoint returning: total, completed count, pending count, completion percentage. Use a single SQL query with conditional aggregation: SELECT COUNT(*), SUM(CASE WHEN completed=1 THEN 1 ELSE 0 END) FROM todos

Bonus: Push this to GitHub. In your README, add a curl one-liner for each endpoint so anyone can test your API without reading the code.

P1-M04 MASTERY CHECKLIST

Can write SELECT queries with WHERE, ORDER BY, LIMIT and explain each clause's purpose
Know the difference between WHERE and HAVING — and when each is used
Can use all five aggregate functions: COUNT, SUM, AVG, MIN, MAX
Can write an INNER JOIN and a LEFT JOIN and explain when each is appropriate
Can write a CTE and explain why it is preferable to a nested subquery
Always use parameterised queries — never string interpolation for SQL
Can load a SQL query result directly into a Pandas DataFrame with pd.read_sql_query()
Can create a FastAPI app with GET and POST endpoints and run it with uvicorn
Know the difference between path parameters, query parameters, and request bodies
Can define a Pydantic BaseModel with field constraints and a custom validator
Know what happens when a request fails Pydantic validation — what status code, what error body
Can raise an HTTPException with the correct status code and detail message
Can connect a SQLite database to FastAPI and implement full CRUD (Create, Read, Update, Delete)
Can test all endpoints via the /docs Swagger UI without writing a client
Can write a FastAPI TestClient test for a happy path and an error case
Completed Lab 1: 10 SQL queries against Titanic dataset
Completed Lab 2: FastAPI endpoint with full validation testing
Completed Lab 3: Full CRUD Todo API with SQLite
Milestone project pushed to GitHub with README, requirements.txt, .gitignore

✅ Part 1 Complete! You now have all the foundation skills needed to build AI systems. Move to Part 4 — LLM API Mastery if you are on the AI/GenAI Engineer path, or Part 2 — Statistics & EDA if you are following the Data Scientist/ML Engineer path.

🎉 Part 1 — Universal Foundation Complete!

You have completed all 4 modules of the foundation. Here is what you can now do:

Write Python programs that read/write files, call APIs, and handle errors

Version code with Git and push projects to GitHub

Navigate the terminal and manage environment variables

Make async HTTP requests and parse JSON responses

Manipulate data with NumPy vectorised operations

Load, clean, and aggregate data with Pandas

Query databases with SQL — joins, aggregations, CTEs

Build and run a FastAPI server with Pydantic validation

← P1-M03: Dev Essentials 🗺️ All Modules Next: P4-M11 — Prompting →