What This Module Covers
FoundationPython is the language of AI engineering. Full stop. Almost every library, API, and tutorial you will encounter over the next six months is in Python. This module takes you from zero to functional Python developer — able to write clean programs, handle files and APIs, and manage a codebase.
- Core syntax — variables, data types, operators, strings, f-strings
- Data structures — lists, tuples, dictionaries, sets and their use-cases
- Control flow — if/elif/else, for loops, while loops, break/continue
- Functions — parameters, return values, *args/**kwargs, lambda, list comprehensions
- File I/O — reading and writing text and CSV files
- Error handling — try/except/finally for robust production code
- OOP basics — classes, objects, __init__, methods, encapsulation
- Environment management — venv, pip, requirements.txt
Why Python for AI Engineering
ContextPython dominates AI/ML for concrete reasons — not just popularity:
- Library ecosystem — NumPy, Pandas, Scikit-learn, PyTorch, LangChain, FastAPI are all Python-first
- API SDKs — OpenAI, Anthropic, HuggingFace all ship Python SDKs as their primary interface
- Rapid prototyping — interactive Jupyter notebooks let you experiment and iterate faster than compiled languages
- Glue language — Python is the orchestration layer that connects your LLM, vector DB, REST API, and deployment pipeline
- Job market — 90%+ of AI/ML job postings require Python as the primary language
The goal this month is not to become a Python expert — it is to stop Googling basic syntax and be able to build simple programs confidently.
Module Connections
DependenciesThis module feeds directly into:
- P1-M02 (NumPy & Pandas) — requires list comprehensions, classes, and file I/O
- P1-M03 (Dev Essentials) — requires understanding of pip, venv, and JSON handling
- P1-M04 (FastAPI) — requires OOP, type hints, and async/await understanding
- P4 (LLM APIs) — every API call is Python. Structured outputs use Pydantic (Python classes)
C/C++/Java background? Here is what maps directly:
- Java classes → Python classes (simpler syntax, no access modifiers)
- C arrays → Python lists (dynamic, mixed types)
- C++ STL map → Python dict
- Java try/catch → Python try/except
Variables, Types and Operators
Week 1Python is dynamically typed — you do not declare types. The interpreter infers them at runtime.
# Basic types name = "Ajay" # str age = 28 # int salary = 85000.50 # float active = True # bool nothing = None # NoneType # Type checking and conversion print(type(name)) # <class 'str'> print(int("42")) # 42 — explicit cast print(str(100)) # "100" # f-strings — the professional way to format print(f"Hello {name}, age {age}") # Hello Ajay, age 28 print(f"{salary:.2f}") # 85000.50
💡 Unlike C/C++, Python variables are references, not memory slots. When you write x = 5, Python creates an integer object with value 5 and binds the name x to it. This matters for understanding mutability later.
Core Data Structures
Week 1–2List — ordered, mutable, allows duplicates
items = ["apple", "banana", "cherry"] items.append("date") # add to end items.insert(1, "avocado") # insert at index items.pop() # remove last print(items[0]) # "apple" — 0-indexed print(items[-1]) # last element print(items[1:3]) # slice [1,3) = ["avocado","banana"] # List comprehension — Pythonic and fast squares = [x**2 for x in range(10) if x % 2 == 0] # [0, 4, 16, 36, 64]
Dictionary — key-value store, O(1) lookup
user = {"name": "Ajay", "age": 28, "city": "Mumbai"}
user["email"] = "ajay@example.com" # add key
user.get("phone", "N/A") # safe get with default
# Dict comprehension
word_len = {w: len(w) for w in ["python", "java", "c++"]}
# {"python": 6, "java": 4, "c++": 3}
# Iterating
for key, val in user.items():
print(f"{key}: {val}")Tuple — ordered, immutable
coords = (19.07, 72.87) # lat, lon of Mumbai lat, lon = coords # tuple unpacking # Use tuples for fixed data that should not change # e.g. HTTP status codes, RGB colours, database records HTTP_OK = (200, "OK")
Set — unordered, unique elements
tags = {"python", "ml", "llm", "python"} # duplicates removed
print(tags) # {"python", "ml", "llm"}
# Set operations — fast membership testing O(1)
a = {1,2,3,4}
b = {3,4,5,6}
print(a & b) # {3, 4} — intersection
print(a | b) # {1,2,3,4,5,6} — union
print(a - b) # {1, 2} — differenceFunctions and Error Handling
Week 2–3# Basic function with type hints (good practice) def greet(name: str, greeting: str = "Hello") -> str: return f"{greeting}, {name}!" # *args — variable positional arguments def total(*numbers): return sum(numbers) print(total(1, 2, 3, 4)) # 10 # **kwargs — variable keyword arguments def create_profile(**fields): return {k: v for k, v in fields.items()} profile = create_profile(name="Ajay", role="engineer") # Lambda — one-line anonymous function square = lambda x: x ** 2 print(sorted([3,1,4], key=lambda x: -x)) # [4, 3, 1]
# Error handling — always handle specific exceptions def read_config(path: str) -> dict: try: with open(path, "r") as f: import json return json.load(f) except FileNotFoundError: print(f"Config not found: {path}") return {} except json.JSONDecodeError as e: print(f"Invalid JSON: {e}") return {} finally: print("Config read attempted") # always runs
Object-Oriented Programming
Week 3# Class definition — blueprint for objects class BankAccount: # Class variable (shared by all instances) bank_name = "PyBank" def __init__(self, owner: str, balance: float = 0.0): # Instance variables (unique per object) self.owner = owner self._balance = balance # _prefix = convention for private def deposit(self, amount: float) -> None: if amount <= 0: raise ValueError("Amount must be positive") self._balance += amount def withdraw(self, amount: float) -> float: if amount > self._balance: raise ValueError("Insufficient funds") self._balance -= amount return amount @property def balance(self) -> float: # getter — access like attribute return self._balance def __repr__(self) -> str: return f"BankAccount({self.owner!r}, {self._balance})" # Usage acc = BankAccount("Ajay", 1000) acc.deposit(500) print(acc.balance) # 1500
💡 Python OOP is simpler than Java/C++ — no access modifiers, no header files. Convention: single underscore _name means "please don't touch this" (not enforced). Double underscore __name triggers name-mangling for true privacy.
File I/O and JSON
Week 3import json, csv from pathlib import Path # Writing and reading JSON (critical for LLM API work) data = {"model": "claude-3", "temperature": 0.7, "tokens": [100, 200]} Path("config.json").write_text(json.dumps(data, indent=2)) loaded = json.loads(Path("config.json").read_text()) # CSV reading — used constantly in data work with open("students.csv", "r") as f: reader = csv.DictReader(f) students = list(reader) # list of dicts, one per row # CSV writing with open("output.csv", "w", newline="") as f: writer = csv.DictWriter(f, fieldnames=["name", "score"]) writer.writeheader() writer.writerows([{"name": "Ajay", "score": 95}])
Virtual Environments and Package Management
EssentialEvery project must have its own virtual environment. This is non-negotiable — it prevents dependency conflicts between projects.
# Create and activate virtual environment python -m venv .venv # create source .venv/bin/activate # Linux/Mac .venv\Scripts\activate # Windows # Install packages pip install requests pandas numpy # install pip install openai anthropic # AI SDKs # Freeze and restore dependencies pip freeze > requirements.txt # save exact versions pip install -r requirements.txt # restore on new machine # Deactivate deactivate
⚠️ Never install packages globally — always activate your venv first. Global installs create conflicts that are painful to debug. Add .venv/ to your .gitignore — never commit the venv folder.
Mutable vs Immutable — The Most Common Bug Source
CriticalUnderstanding mutability prevents an entire class of bugs that trip up engineers coming from C/C++/Java.
# Immutable — int, str, tuple, float, bool x = 5 y = x y = 10 print(x) # Still 5 — y got a new object # Mutable — list, dict, set a = [1, 2, 3] b = a # b points to SAME list as a b.append(4) print(a) # [1, 2, 3, 4] ← a changed! # Fix: explicit copy b = a.copy() # shallow copy b = a[:] # slice copy — same result import copy b = copy.deepcopy(a) # deep copy for nested structures # Dangerous default argument anti-pattern def add_item(item, lst=[]): # BAD — lst shared across calls! lst.append(item) return lst # Correct pattern def add_item(item, lst=None): if lst is None: lst = [] lst.append(item) return lst
Comprehensions and Functional Patterns
Pythonic Code# List comprehension — replaces most for loops even_squares = [x**2 for x in range(20) if x % 2 == 0] # Dict comprehension — used constantly with API responses response_data = [{"id": 1, "name": "alice"}, {"id": 2, "name": "bob"}] id_map = {item["id"]: item["name"] for item in response_data} # {1: "alice", 2: "bob"} # Generator — lazy evaluation, memory efficient for large data def token_chunks(text: str, size: int): words = text.split() for i in range(0, len(words), size): yield " ".join(words[i:i+size]) # Use with large LLM context windows for chunk in token_chunks(long_document, 500): process(chunk) # never loads full doc into memory # zip and enumerate — essential for pairing data names = ["alice", "bob", "charlie"] scores = [85, 92, 78] for i, (name, score) in enumerate(zip(names, scores)): print(f"{i}: {name} = {score}")
Modules, Imports and Project Structure
Production Habit# Standard imports import os, sys, json, csv from pathlib import Path from typing import Optional, List, Dict, Any # Third-party imports (installed via pip) import requests from openai import OpenAI # Relative imports in your own package from .utils import format_response from ..config import API_KEY # Typical project structure # my-ai-app/ # ├── main.py ← entry point # ├── config.py ← constants, env vars # ├── models/ ← Pydantic schemas # │ └── __init__.py # ├── services/ ← business logic # │ ├── __init__.py # │ └── llm.py # ├── requirements.txt # └── .env ← secrets (never commit!) # Reading environment variables (secrets pattern) import os from dotenv import load_dotenv load_dotenv() # loads .env file api_key = os.environ.get("OPENAI_API_KEY") # never hardcode keys
Async/Await — Critical for LLM APIs
Month 2 PreviewLLM API calls are I/O-bound — they wait for network responses. Async Python lets your program do other work while waiting, instead of blocking.
import asyncio # Sync version — blocks for 3 seconds total import time def fetch_sync(): time.sleep(1) # simulates API call return "result" # Async version — runs concurrently, total ~1 second async def fetch_async(): await asyncio.sleep(1) # yields control while waiting return "result" async def main(): # Run 3 API calls concurrently results = await asyncio.gather( fetch_async(), fetch_async(), fetch_async() ) return results asyncio.run(main()) # entry point for async code # Anthropic async client pattern (Month 2) # async with anthropic.AsyncAnthropic() as client: # response = await client.messages.create(...)
💡 You do not need to master async now. The key insight is: async def defines a coroutine (a function that can pause), and await is where it pauses to let other work run. You will use this constantly when calling LLM APIs and building FastAPI endpoints.
3-WEEK STRUCTURED PLAN
| Week | Topics | Daily Task / Mini-Project |
|---|---|---|
| Week 1 | Install Python 3.10+ and VS Code. Variables, data types, type casting, string methods and f-strings. Lists — indexing, slicing, list methods. Tuples vs lists. Control flow: if/elif/else, for loops, while loops, break/continue/pass. | Day 1–2: Unit converter (km↔miles, °C↔°F). Day 3: String palindrome checker. Day 4–5: Shopping list CLI using lists (add, remove, display). Day 6–7: Number guessing game with while loop + score tracker. |
| Week 2 | Dictionaries — CRUD operations, nested dicts, dict comprehensions. Sets — union, intersection, difference. Functions — defining, default args, *args/**kwargs, lambda, list comprehensions. Modules and import system. | Day 1–2: Phone book CLI using dictionaries (add, search, delete, update). Day 3–4: Grade classifier using if/elif (A/B/C/D/F with GPA). Day 5–7: Word frequency counter — takes a text file, returns top-10 words using dicts + sorted + lambda. |
| Week 3 | File I/O — open(), read(), write() with text and CSV. JSON — json.loads(), json.dumps(), working with nested structures. Error handling — try/except/finally, custom exceptions. OOP — classes, __init__, methods, @property. venv + pip + requirements.txt. | Day 1–2: CSV reader/writer for student grade data. Day 3–4: Bank Account class with deposit, withdraw, balance property. Day 5–7: Full milestone project — CLI Student Grade Management System (see Projects tab). |
Environment Setup — Do This First
Day 1# 1. Install Python 3.10+ from python.org # 2. Install VS Code + Python extension (Microsoft) # 3. Or use Google Colab — zero setup, free GPU # https://colab.research.google.com/ # Verify installation python --version # Python 3.10.x or higher pip --version # pip 23.x # Install core packages you will use throughout Part 1 pip install jupyter numpy pandas matplotlib requests python-dotenv
The Most Important Learning Habit
Meta-SkillThe most common beginner mistake is consuming content passively — reading along, nodding, and never opening a code editor. Every concept in this module must be typed out and run. Not copy-pasted. Typed. Your fingers need to know the syntax before your brain does.
- Open a Python REPL (
pythonin terminal) and experiment immediately after each concept - Every error message is a learning opportunity — read it fully before searching
- Push every mini-project to GitHub, even if it is 20 lines
- If something works but you don't know why — break it intentionally and observe
FREE LEARNING RESOURCES
| Type | Resource | Best For |
|---|---|---|
| Course | CS50P — Introduction to Programming with Python (Harvard, Free) | Best free Python course. Structured problem sets. Certificate on completion. |
| Video | Python for Beginners — freeCodeCamp (YouTube, 4.5 hrs) | Single video covering all fundamentals. Watch at 1.5x after Week 1. |
| Course | Python for Everybody — Coursera (Free to audit) | Best for absolute beginners. Dr. Chuck is exceptionally clear. |
| Docs | Official Python Tutorial — python.org | Authoritative reference. Dry but precise. Use as lookup, not primary resource. |
| Course | Kaggle Python Course (Free, Interactive) | Hands-on exercises with instant feedback. Great for Week 1–2. |
| Book | Automate the Boring Stuff with Python (Free online) | Project-oriented. Best book for building real scripts in Week 3. |
| Video | Corey Schafer — Python OOP Tutorials (YouTube Playlist) | Best OOP explanation for engineers coming from Java/C++. |
| Tool | Google Colab — Free Cloud Jupyter Notebooks | Zero setup. Free GPU. Use if local setup is painful. |
PRACTICE DATASET
| Type | Resource | Used In |
|---|---|---|
| Dataset | UCI Student Performance Dataset | Milestone project — CLI Grade Management System |
MILESTONE PROJECT
Build a command-line application that manages student grade data using Python fundamentals. This project tests every concept from the module in a cohesive real-world context.
Requirements
- Reads student data from a CSV file (name, subject scores)
- Calculates grade (A/B/C/D/F), GPA, and class rank for each student
- Supports filtering by subject or grade range
- Sorts students by any column (name, GPA, specific subject)
- Handles invalid input gracefully with try/except (missing file, bad data)
- Writes a cleaned summary CSV as output
- CLI menu: view all / search by name / filter / sort / export / quit
Stretch Goals
- Add a Student class with methods (calculate_gpa, get_grade, __repr__)
- Store data in JSON format as an alternative to CSV
- Add a simple stats report: class average, highest/lowest scorer, grade distribution
Skills demonstrated: File I/O, CSV handling, dictionaries, lists, functions, error handling, OOP basics, sorting with lambda, string formatting
Dataset: UCI Student Performance Dataset or create your own CSV
Push to GitHub with a README describing what the tool does and how to run it.
MINI-PROJECTS (WEEKLY)
Build a CLI tool that converts between: km↔miles, °C↔°F, kg↔lbs. Menu-driven loop. Handles invalid input. Demonstrates: variables, type casting, f-strings, conditionals, while loop.
Takes a .txt file as input. Returns the top-10 most frequent words (excluding common stop words). Uses: file I/O, dicts, sorted() with lambda, set for stop words. Try it on a book chapter from Project Gutenberg.
Call the Open-Meteo weather API (no API key needed) using the requests library. Format and print a 7-day forecast. Save the raw JSON response to a file. Push to GitHub with README. This is a preview of Month 2's API work.
import requests, json url = "https://api.open-meteo.com/v1/forecast?latitude=19.07&longitude=72.87&daily=temperature_2m_max&timezone=Asia/Kolkata" r = requests.get(url) data = r.json() print(json.dumps(data, indent=2))
Python REPL Exploration — Types and Mutability
Objective: Build intuition for Python's type system and mutability through hands-on exploration in the REPL.
python3 (or python on Windows). You are now in the interactive REPL (Read-Eval-Print Loop). Type expressions and see results immediately.x = [1, 2, 3], then y = x, then y.append(4), then print(x). Did x change? Why?a = "hello", then b = a, then b = b + " world", then print(a). Did a change? Why not? What is the key difference between strings and lists?import sys then sys.getsizeof([]) vs sys.getsizeof([1,2,3,4,5]). See memory usage grow. Try the same with a string of different lengths.def add(x, lst=[]): lst.append(x); return lst. Call it three times: add(1), add(2), add(3). What happens? Fix the function.id() to see object identity: a = [1,2,3]; b = a; print(id(a) == id(b)). Then do b = a.copy(); print(id(a) == id(b)). What changes?Build a JSON Config Reader with Error Handling
Objective: Write production-quality Python that reads configuration files robustly — a pattern you will use in every AI project.
config.json with this content: {"model": "gpt-4", "temperature": 0.7, "max_tokens": 1000, "api_key": "sk-test"}load_config(path: str) -> dict that reads this file. Use try/except to handle: FileNotFoundError (return empty dict), json.JSONDecodeError (print error, return empty dict), PermissionError (print error, return empty dict).validate_config(config: dict) -> bool function that checks: "model" key exists, "temperature" is between 0 and 2, "max_tokens" is a positive integer. Return True only if all pass.save_config(config: dict, path: str) -> None function that writes back to JSON with 2-space indentation. Add a timestamp field: config["last_updated"] = datetime.now().isoformat()os.environ.get() to override the api_key from an environment variable instead of reading it from the file. This is the secure pattern used in all production AI projects.OOP — Build a Student Registry Class
Objective: Apply OOP concepts to build a reusable data class — a preview of the Pydantic models you will use in Part 4.
Student class with: __init__(self, name, scores: dict) where scores is a dict of subject→score pairs. Store both as instance variables.gpa property that calculates the average of all scores. Add a grade property that returns "A" if gpa >= 90, "B" if >= 80, etc.__repr__ and __str__ methods. __repr__ should be unambiguous (useful for debugging). __str__ should be human-readable.StudentRegistry class that holds a list of Student objects. Add methods: add(student), find(name), top_n(n) (returns top n by GPA), class_average().to_csv(path) and from_csv(path) class methods to the registry for persistence. Test the full round-trip: create → save → load → query.P1-M01 MASTERY CHECKLIST
- Can explain the difference between mutable and immutable types and give one real bug this causes
- Can write a list comprehension that filters and transforms a list in one line
- Can define a function with default arguments, *args, and **kwargs and explain when to use each
- Know the difference between a list, tuple, set, and dict — and when to use each
- Can read and write JSON and CSV files using the standard library
- Can handle FileNotFoundError, json.JSONDecodeError, and ValueError cleanly with try/except
- Can create a class with __init__, instance variables, properties, and __repr__
- Know what a virtual environment is, can create one, activate it, and install packages
- Can read an API key from environment variables (not hardcoded in source)
- Know what async def and await mean conceptually and why LLM APIs use them
- Completed Lab 1: REPL exploration of types and mutability
- Completed Lab 2: JSON config reader with full error handling
- Completed Lab 3: Student class with OOP patterns
- Milestone project pushed to GitHub with README
✅ When complete: Move to P1-M02 — NumPy & Pandas Data Toolkit. The list/dict/CSV skills you built here directly underpin everything in NumPy array indexing and Pandas DataFrame operations.