What This Module Covers
FoundationThis module covers the developer tooling every AI engineer uses every single day — version control, terminal navigation, calling web APIs, and asynchronous Python. These are not optional extras — being slow or uncomfortable with any of them is a real bottleneck when building AI systems.
- Git & GitHub — version control, branching, merging, pushing to remote, writing good READMEs
- CLI / Terminal — navigation, file operations, environment variables, running scripts, PATH
- HTTP & REST APIs — GET/POST requests, status codes, headers, API keys, JSON parsing
- Python requests library — calling any web API from Python with error handling
- JSON handling — loading, dumping, nested structures, serialisation edge cases
- Async/await — what coroutines are, why LLM APIs use them, how to write and run async code
Why These Skills Matter for AI Engineering
Context- Git — every AI project lives in a repo. Your GitHub profile is your resume. Every module project from here goes on GitHub.
- CLI — you will run Python scripts, start servers, install packages, and manage containers entirely from the terminal. Being slow here is a daily tax on your productivity.
- HTTP/APIs — calling the OpenAI or Anthropic API is just an HTTP POST request. Understanding what happens under the hood makes you a better debugger when things go wrong.
- Async — LLM API calls are I/O-bound. The Anthropic and OpenAI Python SDKs are async-first. FastAPI (which you use in M04) runs async handlers. You cannot build production AI apps without understanding this.
Git Mental Model — What Problem It Solves
Concept FirstGit is confusing when you try to memorise commands before understanding the model. Understand this first: Git tracks snapshots of your project at points in time (commits). Every commit is a full snapshot, not a diff. Branches are just lightweight pointers to commits.
The staging area is Git's unique feature — it lets you carefully choose exactly which changes to include in the next commit, even if you have made 10 unrelated changes across files.
Core Git Commands
Daily Use# ── SETUP (once per machine) ────────────────────────── git config --global user.name "Ajay Kumar Gupt" git config --global user.email "your@email.com" git config --global core.editor "code --wait" # VS Code as editor # ── START A PROJECT ─────────────────────────────────── git init # initialise new repo in current dir git clone <url> # clone existing repo from GitHub # ── DAILY WORKFLOW ──────────────────────────────────── git status # what changed? (run this constantly) git add . # stage all changes git add src/main.py # stage specific file git commit -m "feat: add streaming response handler" git push origin main # push to GitHub git pull origin main # get latest changes # ── HISTORY ─────────────────────────────────────────── git log --oneline # compact commit history git log --oneline --graph # visualise branch graph git diff # unstaged changes git diff --staged # staged changes (what will be committed) # ── UNDO ────────────────────────────────────────────── git restore <file> # discard unstaged changes to a file git restore --staged <file> # unstage a file git revert <commit-hash> # undo a commit safely (creates new commit)
Branching and Merging
Collaboration# Create and switch to a new branch git checkout -b feature/add-rag-pipeline # create + switch git switch -c feature/add-rag-pipeline # modern equivalent # List branches git branch # local branches git branch -a # local + remote branches # Switch between branches git switch main git switch feature/add-rag-pipeline # Merge feature branch into main git switch main git merge feature/add-rag-pipeline # Delete merged branch git branch -d feature/add-rag-pipeline # local git push origin --delete feature/add-rag-pipeline # remote # Push new branch to GitHub for first time git push -u origin feature/add-rag-pipeline
💡 Branch naming convention for AI projects: feat/rag-pipeline, fix/token-overflow, docs/module-p4. Keep branch names short, lowercase, hyphenated. Delete branches after merging — a clean branch list is a healthy repo.
.gitignore and Repository Hygiene
Essential HabitA well-maintained .gitignore prevents secrets, large files, and generated artifacts from entering your repo.
# .gitignore for a Python AI project # Virtual environment .venv/ venv/ env/ # Secrets — NEVER commit these .env .env.local *.key *_secret* # Python artifacts __pycache__/ *.py[cod] *.egg-info/ dist/ build/ # Jupyter .ipynb_checkpoints/ *.ipynb # optional — commit notebooks if they are documentation # Data and models — too large for Git data/raw/ *.csv # if large; keep small sample CSVs *.pkl *.pt # PyTorch model weights *.bin # HuggingFace model files chroma_db/ *.faiss # OS files .DS_Store Thumbs.db
⚠️ If you accidentally commit a secret (API key), treat it as compromised immediately. Rotate the key with the provider. Remove it from history using git filter-branch or BFG Repo Cleaner. Git history is public — even after deletion, the key may have been scraped.
Writing a Good README
PortfolioYour README is the first thing a recruiter, collaborator, or future-you sees. Every project from this roadmap needs one.
# README.md template for AI/ML projects
# Project Title
One compelling sentence describing what it does and why it matters.
## Problem Statement
What real problem does this solve? One paragraph.
## Demo

Live demo: https://your-deployed-app.com
## Tech Stack
Python · FastAPI · LangChain · ChromaDB · Docker
## Quick Start
```bash
git clone https://github.com/you/project.git
cd project
cp .env.example .env # add your API keys
pip install -r requirements.txt
python main.py
```
## Approach
- Brief description of your methodology (3–5 bullet points)
## Results
Key metrics achieved (e.g. RAG retrieval accuracy: 87%, latency: 340ms)
## Project Structure
```
project/
├── main.py # entry point
├── src/ # core logic
├── data/ # sample data only
└── tests/ # test suite
```Essential Terminal Commands
Daily Use# ── NAVIGATION ──────────────────────────────────────── pwd # print working directory — where am I? ls # list files in current directory ls -la # list all files including hidden, with details cd /path/to/dir # change directory (absolute path) cd .. # go up one level cd ~ # go to home directory cd - # go back to previous directory # ── FILES AND DIRECTORIES ───────────────────────────── mkdir my-project # create directory mkdir -p a/b/c # create nested dirs in one command touch main.py # create empty file cp source.py dest.py # copy file mv old.py new.py # rename or move file rm file.py # delete file (no recycle bin!) rm -rf directory/ # delete directory recursively (irreversible) # ── READING FILES ───────────────────────────────────── cat config.py # print file contents less large_file.log # page through large file (q to quit) head -20 data.csv # first 20 lines tail -50 app.log # last 50 lines (great for log monitoring) tail -f app.log # follow — stream new lines in real time grep "ERROR" app.log # search for pattern in file grep -r "api_key" . # search recursively in all files # ── RUNNING PYTHON ──────────────────────────────────── python main.py # run script python -m uvicorn main:app --reload # run FastAPI dev server python -c "import sys; print(sys.path)" # one-liner python -m pytest tests/ # run tests
Environment Variables and PATH
Critical for AIEvery API key you use — OpenAI, Anthropic, HuggingFace — should live in an environment variable, never in your source code. Understanding how environment variables work is non-negotiable.
# Set an environment variable in the shell (temporary) export OPENAI_API_KEY="sk-proj-..." export ANTHROPIC_API_KEY="sk-ant-..." # Read it back echo $OPENAI_API_KEY # Permanent — add to ~/.bashrc or ~/.zshrc echo 'export OPENAI_API_KEY="sk-proj-..."' >> ~/.bashrc source ~/.bashrc # reload without restarting terminal # In Python — the secure pattern for all AI projects import os from dotenv import load_dotenv load_dotenv() # reads .env file from project root api_key = os.environ.get("OPENAI_API_KEY") if not api_key: raise ValueError("OPENAI_API_KEY not set. Check your .env file.") # .env file (in project root, never committed to Git) # OPENAI_API_KEY=sk-proj-... # ANTHROPIC_API_KEY=sk-ant-... # DATABASE_URL=postgresql://...
# PATH — tells your shell where to find executables echo $PATH # colon-separated list of directories # If 'python' command not found, your Python install dir is missing from PATH which python # where is Python installed? which pip # where is pip? # Add a directory to PATH (in ~/.bashrc) export PATH=$PATH:/home/user/.local/bin
Process Management and Pipes
Productivity# Run process in background python server.py & # & sends to background jobs # list background jobs fg # bring last job to foreground Ctrl+C # kill foreground process Ctrl+Z # suspend foreground process # Find and kill a process using a port (e.g. port 8000) lsof -ti:8000 # find PID using port 8000 kill -9 $(lsof -ti:8000) # kill it # Pipes — chain commands together cat data.csv | grep "2024" | head -20 # filter and preview ps aux | grep python # find Python processes cat requirements.txt | wc -l # count dependencies # Redirect output to file python train.py > train.log 2>&1 # stdout + stderr to file python train.py 2>&1 | tee train.log # write to file AND print to terminal
How HTTP Works — The Mental Model
Concept FirstEvery LLM API call is an HTTP request. Understanding the request/response cycle makes you a far better debugger when calls fail, return unexpected results, or hit rate limits.
# An HTTP request has: # METHOD — what action: GET (read), POST (create/send), PUT (update), DELETE # URL — where to send it: https://api.anthropic.com/v1/messages # HEADERS — metadata: Content-Type, Authorization, x-api-key # BODY — data to send (POST/PUT only): usually JSON # Example: what happens when you call the Anthropic API # POST https://api.anthropic.com/v1/messages # Headers: # x-api-key: sk-ant-... # anthropic-version: 2023-06-01 # content-type: application/json # Body: # {"model":"claude-3-5-sonnet","max_tokens":1024,"messages":[...]} # An HTTP response has: # STATUS CODE — 200 OK, 400 Bad Request, 401 Unauthorised, 429 Rate Limited, 500 Server Error # HEADERS — Content-Type, rate limit remaining, request ID # BODY — the actual response, usually JSON
Python requests Library
Core Skillimport requests import json # GET request — read data response = requests.get("https://api.open-meteo.com/v1/forecast", params={ "latitude": 19.07, "longitude": 72.87, "daily": "temperature_2m_max", "timezone": "Asia/Kolkata" } ) print(response.status_code) # 200 data = response.json() # parse JSON body # POST request — send data (how LLM APIs work) response = requests.post( "https://api.anthropic.com/v1/messages", headers={ "x-api-key": api_key, "anthropic-version": "2023-06-01", "content-type": "application/json", }, json={ # json= param auto-sets Content-Type and serialises "model": "claude-3-5-sonnet-20241022", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello!"}] } ) # Always check status before using response response.raise_for_status() # raises HTTPError for 4xx/5xx result = response.json() print(result["content"][0]["text"])
# Robust request with timeout and error handling def call_api(url: str, payload: dict, headers: dict) -> dict: try: response = requests.post( url, json=payload, headers=headers, timeout=30 # always set a timeout — never wait forever ) response.raise_for_status() return response.json() except requests.exceptions.Timeout: print("Request timed out after 30s") return {} except requests.exceptions.HTTPError as e: print(f"HTTP {e.response.status_code}: {e.response.text}") return {} except requests.exceptions.ConnectionError: print("Cannot connect — check network / URL") return {}
JSON Deep Dive
Critical for LLM Responsesimport json # Serialise Python → JSON string data = {"name": "Ajay", "scores": [85, 92], "active": True, "meta": None} json_str = json.dumps(data) # compact json_str = json.dumps(data, indent=2) # pretty-printed # Deserialise JSON string → Python parsed = json.loads(json_str) # File I/O with open("data.json", "w") as f: json.dump(data, f, indent=2) with open("data.json") as f: loaded = json.load(f) # Python ↔ JSON type mapping # Python dict → JSON object {} # Python list → JSON array [] # Python str → JSON string "" # Python int/float → JSON number # Python True → JSON true # Python None → JSON null # Navigating nested LLM API responses response = { "id": "msg_01", "content": [{"type": "text", "text": "Hello! How can I help?"}], "usage": {"input_tokens": 10, "output_tokens": 8} } text = response["content"][0]["text"] # direct access tokens = response.get("usage", {}).get("output_tokens", 0) # safe get
Why Async — The Problem It Solves
Concept FirstLLM API calls take 1–10 seconds each. If you make 10 calls sequentially, you wait 10–100 seconds. Async Python lets you start all 10 calls, then handle them as they complete — total wait ≈ the slowest single call.
# Synchronous — sequential, blocks on each call import time def slow_api_call(n): time.sleep(2) # simulates 2s LLM API call return f"result_{n}" start = time.time() results = [slow_api_call(i) for i in range(5)] print(f"Sync: {time.time()-start:.1f}s") # ~10.0s # Asynchronous — concurrent, all run simultaneously import asyncio async def slow_api_call_async(n): await asyncio.sleep(2) # yields control while waiting return f"result_{n}" async def main(): start = time.time() results = await asyncio.gather( *[slow_api_call_async(i) for i in range(5)] ) print(f"Async: {time.time()-start:.1f}s") # ~2.0s return results asyncio.run(main())
💡 Async does NOT make code faster for CPU-bound work — it only helps for I/O-bound work (network calls, file reads, database queries). LLM API calls are I/O-bound. Matrix multiplications are CPU-bound. Know the difference.
Async Syntax and Patterns
Core Syntaximport asyncio, httpx # async def — defines a coroutine (NOT a regular function) async def fetch_weather(city: str) -> dict: async with httpx.AsyncClient() as client: # async HTTP client response = await client.get( "https://api.open-meteo.com/v1/forecast", params={"latitude": 19.07, "longitude": 72.87} ) return response.json() # await — pauses current coroutine until awaitable completes # Can only use await INSIDE an async def function # asyncio.gather — run multiple coroutines concurrently async def fetch_all_cities(): results = await asyncio.gather( fetch_weather("Mumbai"), fetch_weather("Delhi"), fetch_weather("Bangalore"), ) return results # asyncio.run — entry point for top-level async code if __name__ == "__main__": results = asyncio.run(fetch_all_cities())
# Async context managers — async with async with httpx.AsyncClient() as client: # client is available here, closed automatically after block response = await client.get(url) # Async iteration — async for async def stream_response(): async with anthropic_client.messages.stream(...) as stream: async for text in stream.text_stream: print(text, end="", flush=True) # asyncio.create_task — fire and forget (don't wait immediately) async def main(): task1 = asyncio.create_task(fetch_weather("Mumbai")) task2 = asyncio.create_task(fetch_weather("Delhi")) # ... do other work here ... result1 = await task1 # now wait for results result2 = await task2
Common Async Mistakes
Pitfalls# MISTAKE 1 — forgetting await (most common) async def bad(): result = fetch_weather("Mumbai") # returns coroutine object, not result! print(result) # prints <coroutine object ...> async def good(): result = await fetch_weather("Mumbai") # correct # MISTAKE 2 — calling async function without await at top level fetch_weather("Mumbai") # creates coroutine but never runs it asyncio.run(fetch_weather("Mumbai")) # correct # MISTAKE 3 — using time.sleep instead of asyncio.sleep in async code async def bad_sleep(): time.sleep(2) # BLOCKS the entire event loop — kills concurrency async def good_sleep(): await asyncio.sleep(2) # yields control to event loop # MISTAKE 4 — using requests (sync) in async code # Use httpx.AsyncClient() or aiohttp instead of requests in async functions
💡 Rule of thumb: If you are inside an async def function, any blocking I/O call (requests, time.sleep, file reads with slow storage) must be replaced with its async equivalent. Mixing sync blocking calls into async code defeats the entire purpose.
Async in FastAPI and LLM SDKs — Preview
Month 2 Preview# FastAPI — all route handlers can be async from fastapi import FastAPI app = FastAPI() @app.get("/health") async def health_check(): return {"status": "ok"} @app.post("/chat") async def chat(message: str): # await the LLM call — non-blocking response = await llm_client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, messages=[{"role": "user", "content": message}] ) return {"reply": response.content[0].text} # Anthropic SDK — async client import anthropic async def ask_claude(prompt: str) -> str: client = anthropic.AsyncAnthropic() # async client message = await client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, messages=[{"role": "user", "content": prompt}] ) return message.content[0].text
2-WEEK STRUCTURED PLAN
| Week | Topics | Daily Task / Mini-Project |
|---|---|---|
| Week 1 Git + CLI |
Install Git. Configure user.name and user.email. git init, add, commit, push, pull. Branching and merging. .gitignore for Python/AI projects. Terminal navigation: pwd, ls, cd, mkdir, rm, cp, mv. cat, less, grep, head, tail. Environment variables and .env files. Running Python scripts from terminal. | Day 1: Push all previous module projects to GitHub with proper READMEs. Day 2–3: Create a feature branch, make changes, merge back — practice the full branch→PR→merge workflow. Day 4–5: Write a shell one-liner that finds all Python files modified in the last 24 hours. Day 6–7: Set up .env file and load API keys using python-dotenv in a test script. |
| Week 2 APIs + Async |
HTTP fundamentals: GET vs POST, status codes, headers, request/response structure. Python requests library: GET, POST, params, json=, headers, timeout, error handling. JSON: json.loads/dumps, nested navigation, file I/O. Async/await: asyncio.run, asyncio.gather, asyncio.sleep. httpx.AsyncClient for async HTTP. Common async mistakes. | Day 1–2: Write a weather CLI tool using Open-Meteo API (no key needed) — print 7-day forecast formatted nicely. Day 3–4: Rewrite the weather tool using async httpx to fetch 5 cities simultaneously. Day 5–7: Milestone project — Public API Script (see Projects tab). |
FREE LEARNING RESOURCES
| Type | Resource | Best For |
|---|---|---|
| Interactive | GitHub Skills — skills.github.com | Official interactive Git courses built inside GitHub. Start here for Git. |
| Interactive | Learn Git Branching — learngitbranching.js.org | Best visual tool for understanding branches and merges. Do all levels. |
| Book | Pro Git Book (Free online) — git-scm.com | Comprehensive reference. Read Ch 1–3 then use as lookup. |
| Docs | MDN Web Docs: HTTP Overview | Best explanation of how HTTP requests and responses work. |
| Docs | Python requests library docs — requests.readthedocs.io | Comprehensive reference for calling web APIs in Python. |
| Course | Real Python: Async IO in Python — realpython.com | Best async/await tutorial. Read after Week 2 Day 3. |
| Course | MIT Missing Semester — missing.csail.mit.edu | Shell scripting, terminal tools, and CLI fluency. Best for experienced engineers. |
MILESTONE PROJECT
Build a Python script that calls real public APIs, handles errors robustly, uses async for concurrent requests, and is pushed to GitHub as a proper project.
Requirements
- Calls the Open-Meteo API to fetch a 7-day weather forecast (no API key needed)
- Accepts a list of 5 cities as input — fetches all 5 concurrently using asyncio.gather
- Parses the JSON response and formats output as a clean table (city, date, max temp, min temp)
- Handles errors: invalid city, timeout (30s), HTTP errors — never crashes
- Saves raw JSON responses to a
data/folder with timestamp in filename - Proper .gitignore, .env.example, requirements.txt, and README
Stretch Goals
- Add a
--cacheflag that reads from saved JSON if file is less than 1 hour old - Accept cities as CLI arguments using argparse
- Add a simple retry mechanism: if a request fails, retry up to 3 times with 1s backoff
# Starter structure import asyncio, httpx, json from pathlib import Path from datetime import datetime BASE_URL = "https://api.open-meteo.com/v1/forecast" CITIES = { "Mumbai": (19.07, 72.87), "Delhi": (28.67, 77.22), "Bangalore": (12.97, 77.59), "Chennai": (13.08, 80.27), "Kolkata": (22.57, 88.36), } async def fetch_city(client: httpx.AsyncClient, city: str, lat: float, lon: float) -> dict: # Your implementation here ...
Push all three projects from P1-M01 and P1-M02 to GitHub. Each must have: a proper README (problem, tech stack, how to run, example output), requirements.txt, .gitignore, and at least 3 commits with meaningful commit messages (not just "update" or "fix"). This is your portfolio foundation — start it right.
Git — The Full Branch, Conflict, and Merge Workflow
Objective: Experience a real merge conflict and resolve it — this is something every developer encounters and many find intimidating the first time.
mkdir git-lab && cd git-lab && git init. Create main.py with one function. Add, commit with message "feat: initial main function".git checkout -b feature-a. Edit line 3 of main.py to say "Version A". Commit. Switch back to main: git switch main.git checkout -b feature-b. Edit the SAME line 3 to say "Version B". Commit. Switch back to main: git switch main.git merge feature-a. Then try to merge feature-b: git merge feature-b. Git reports a conflict.<<<<<<< HEAD, =======, >>>>>>> feature-b. Edit the file to keep the version you want (or combine both). Remove all conflict markers.git add main.py then git commit -m "merge: resolve conflict between feature-a and feature-b". Run git log --oneline --graph to see the merge commit in the branch graph.HTTP Debugging — Inspect Every Layer of an API Call
Objective: See exactly what bytes travel over the network when you call an API — building intuition for debugging production failures.
pip install httpie. Run: http GET "https://api.open-meteo.com/v1/forecast?latitude=19.07&longitude=72.87&daily=temperature_2m_max&timezone=Asia/Kolkata". Observe: status line, response headers, JSON body.httpbin.org/status/429 to see a rate limit response. Document the full response for each.response.status_code, response.headers["Content-Type"], len(response.content) (bytes), and response.elapsed.total_seconds() (latency).requests.adapters.HTTPAdapter with max_retries=3. Test that it retries on connection errors by pointing to a non-existent host.curl -v from the terminal to make the same API call. Identify: the TLS handshake, the HTTP request headers sent, and the response headers received. Compare with what requests sends.Async Concurrency — Measure Real Speedup
Objective: Empirically measure the async speedup on real network requests — so the performance benefit is concrete, not theoretical.
pip install httpx. Create a list of 10 different city coordinates for the Open-Meteo API.time.perf_counter().await asyncio.sleep(0) inside the async function to simulate yielding. Does performance change? Why or why not?asyncio.gather(..., return_exceptions=True).P1-M03 MASTERY CHECKLIST
- Can explain the Git working directory → staging area → local repo → remote flow in your own words
- Know the difference between git merge and git rebase — and when to use each
- Can resolve a merge conflict without using a GUI tool
- Have a .gitignore that covers Python artifacts, virtual environments, .env files, and model weights
- Can navigate the terminal without hesitation: cd, ls, mkdir, rm, grep, find, cat, tail -f
- Know what PATH is and can diagnose a "command not found" error
- Can load API keys from environment variables — never hardcode secrets in source code
- Know all HTTP methods (GET, POST, PUT, DELETE) and when each is used
- Can identify what went wrong from HTTP status codes: 400, 401, 403, 404, 429, 500, 503
- Can make a GET and POST request in Python using requests with proper timeout and error handling
- Can parse a deeply nested JSON response and safely access values with .get()
- Can explain what async def and await do in plain English
- Know the difference between asyncio.gather and sequential awaits — and when each is right
- Know never to use time.sleep or requests inside an async function — and what to use instead
- Can run asyncio.run() at the top level and write a coroutine that calls an async HTTP client
- Completed Lab 1: full branch, conflict, and merge workflow
- Completed Lab 2: HTTP debugging — inspected all layers of a real API call
- Completed Lab 3: measured real async speedup on concurrent HTTP requests
- Milestone project pushed to GitHub with README, .gitignore, requirements.txt
✅ When complete: Move to P1-M04 — SQL Basics & FastAPI. Everything you built here — async functions, HTTP knowledge, JSON handling — feeds directly into building your first API server and database queries.