🤖 AI & Machine Learning
Complete Learning Roadmap

From Zero to Production-Ready AI Professional — Revised & Expanded Edition

9 Parts28+ Modules15+ Projects 100+ Free Resources36 Weeks Full Stack

Data AnalystData Scientist ML EngineerAI/GenAI Engineer

Portfolio & Job-Readiness Checklist 0%

0 of 17 items completed

🗺️ Learning Path Overview

AI / GenAI Engineer path

ML / Data Science branch (optional)

Interchange — both paths stop here

🧭

Choose Your Learning Path

Data Analyst

Data Scientist

ML Engineer

AI/GenAI Engineer

Full Stack AI/ML

Duration

~14 weeks

Outcome

SQL, Pandas, EDA, Visualization

P0 How to UseP1 Foundation P2 Stats & EDAP3 Classical ML (optional)

Duration

~24 weeks

Outcome

Full ML competency, model building & evaluation

P0P1 Foundation P2 Stats & EDAP3 Classical ML P7 Production (optional)

Duration

~28 weeks

Outcome

Build and deploy production ML models

P0P1 Foundation P2 Stats & EDAP3 Classical ML P7 Production + MLOps

Duration

~26 weeks

Outcome

LLMs, RAG, Agents, Production AI systems

P0P1 Foundation P2 Stats (optional)P3 Classical ML (optional) P4 LLM APIsP5 RAG P6 Agents & EvalsP7 Production P8 Specialization

Duration

~36 weeks

Outcome

The most versatile, future-ready AI/ML profile

All 9 Parts

🤖 AI / GenAI Engineer Path

P0 → P1 Foundation → P4 LLM APIs → P5 RAG → P6 Agents → P7 Production → P8 Specialization

📊 Data Scientist / ML Engineer Path

P0 → P1 Foundation → P2 Stats & EDA → P3 Classical ML → P7 Production (MLOps) → P8 Specialization

⚡ Parallel Branch: Parts 2 & 3 (Stats/EDA and Classical ML) are a separate parallel branch for AI Engineers — labelled OPTIONAL. Not required to build LLM systems, but strongly recommended for data intuition and broader employability.

Part 0 — How to Use This Roadmap

Reading Guide • Path Selector • Role Comparison • Math Prerequisites

Everyone Starts Here

Reading Guide & Time Commitment

Icons • SKIP IF System • 1–2 hrs/day

Complete Beginner: Follow every Part sequentially. Spend extra time on Python basics before rushing to ML.
Beginner with C/C++/Java background: Use the ⚡ SKIP IF boxes. Skim modules where your existing knowledge applies directly.
Time commitment: 1–2 hours/day = ~10 hours/week. Each week is calibrated for this pace.
Projects: Never skip projects. They are your portfolio. Every completed project goes on GitHub.
Resources: All resources listed are free.

Role Comparison & Salary Guide

4 Roles • India & US Salaries • Best Parts per Role

Role	Key Skills	India	US	Best Parts
Data Analyst	SQL, Python, Pandas, Visualization, Stats	8–18 LPA	$65–90K	P0, P1, P2
Data Scientist	ML algorithms, Stats, EDA, Model building	12–25 LPA	$90–130K	P0–P3
ML Engineer	ML + Software Eng., MLOps, Deployment, APIs	15–30 LPA	$110–160K	P0–P3, P7
AI/GenAI Eng.	LLMs, RAG, Prompt Eng., Agents, Vector DBs	18–40 LPA	$120–180K	P0,P1,P4–P8

Math & Statistics Prerequisites

Linear Algebra • Calculus • Probability & Stats

📌 You do NOT need to master all math before starting. Study alongside Parts 1–2. If you did B.Tech/engineering, you already know most of this — focus only on the statistics section.

Linear Algebra: vectors, matrices, dot product, matrix multiplication, eigenvalues
Calculus: derivatives, partial derivatives, chain rule, maxima/minima for gradient descent
Probability & Statistics: distributions, Bayes theorem, hypothesis testing, correlation

Free Math Resources

Type	Resource	Category
Course	Khan Academy — Linear Algebra	Linear Algebra
Course	Khan Academy — Statistics & Probability	Statistics
Video	3Blue1Brown — Essence of Linear Algebra (YouTube)	Visual Linear Algebra
Video	3Blue1Brown — Essence of Calculus (YouTube)	Calculus Visually
Book	Mathematics for Machine Learning (Free PDF — Cambridge)	Comprehensive Math for ML
Video	StatQuest with Josh Starmer (YouTube)	Stats & ML Concepts

Part 1 — Universal Foundation

~10 Weeks • Python • NumPy/Pandas • Git/CLI • APIs • SQL/FastAPI

All Roles Required

Module 1: Python Programming Fundamentals

3 Weeks • Beginner • Python 3, VS Code / Jupyter

⚡ SKIP IF: You already program in C/C++/Java — Python syntax for variables, loops, conditionals, functions, and basic OOP will feel very familiar. Spend 2–3 days scanning syntax differences.

What You Will Learn

Core Python syntax: variables, data types, type casting, string operations, f-strings
Data structures: lists, tuples, dictionaries, sets
Flow control: if/elif/else, for/while loops, break/continue
Functions: parameters, return values, *args/**kwargs, lambda, list comprehensions
File I/O: reading and writing text and CSV files
Error handling: try/except/finally for robust code
Intro to OOP: classes, objects, __init__, methods
Virtual environments (venv), pip, requirements.txt

Week-by-Week Plan

Week	Topics Covered	Daily Task / Mini-Project
W1	Install Python 3 & VS Code. Variables, data types, type casting, string methods, lists, tuples	Unit converter (km↔miles). String palindrome checker. Shopping list app.
W2	Dictionaries, sets, if/elif/else, for/while loops, break/continue/pass	Phone book using dicts. Number guessing game. Grade classifier.
W3	Functions, args/*kwargs, file I/O with CSV, try/except/finally, intro to OOP	Student Grade Calculator (file-based). CSV reader/writer. Bank Account OOP class.

Free Resources

Type	Resource	Category
Course	CS50P: Introduction to Programming with Python (Harvard, free)	Best free Python course
Video	Python for Beginners (freeCodeCamp, YouTube)	4.5hr complete intro
Course	Python for Everybody (Coursera, free to audit)	Best for absolute beginners
Docs	Official Python Tutorial	Official reference
Course	Kaggle Python Course (Free, Interactive)	Hands-on exercises
Book	Automate the Boring Stuff with Python (Free)	Project-oriented Python
Video	Corey Schafer Python OOP Tutorials (YouTube Playlist)	OOP deep dive
Tool	Google Colab — free cloud Jupyter notebooks	Zero setup environment

🛠PROJECT: CLI Student Grade Management System[Beginner] 3–4 days

Build a CLI app that: reads/writes student data from CSV, calculates grades/GPA/class rank, filters and sorts by subject scores, handles invalid input with try/except.

Skills: File I/O, dictionaries, functions, error handling, CSV parsing

Week	Topics Covered	Daily Task / Mini-Project
W4	NumPy arrays, indexing, slicing, boolean masking, broadcasting, np.mean/std/sum/dot/reshape	Compute stats without loops. Matrix multiplication with np.dot. Reshape 1D sensor data.
W5	Pandas Series vs DataFrame, read_csv, .head/.info, .loc/.iloc, NaN handling, drop_duplicates	Load dataset, write data health report. Handle missing values in COVID dataset.
W6	groupby(), .agg()/.transform(), merge/join/concat, pivot_table, string ops, datetime	Top 5 countries by COVID cases. Merge two datasets. Extract month/year from date column.

Type	Resource	Category
Course	Kaggle Pandas Course (Free, Interactive)	Best hands-on Pandas
Video	NumPy for Beginners (freeCodeCamp, YouTube)	NumPy complete
Docs	Pandas Official Documentation User Guide	Official reference
Video	Corey Schafer Pandas Tutorials (YouTube Playlist)	Deep Pandas tutorials
Cheatsheet	Pandas Cheat Sheet — DataCamp (Free PDF)	Quick reference

Week	Topics Covered	Daily Task / Mini-Project
W7	Descriptive stats, normal distribution, skewness/kurtosis, correlation matrix, outlier detection, Matplotlib	Plot house price distribution. Compute Titanic correlation matrix. Find outliers in salary dataset.
W8	Seaborn histplot/boxplot/violin/pairplot, bar charts, scatter plots with regression line, heatmaps	Full EDA report with 6+ visualizations. Pairplot for Titanic survival. Monthly sales trend chart.

Type	Resource	Category
Video	StatQuest — Statistics for Machine Learning (YouTube)	Best stats intuition
Course	Kaggle Data Visualization Course (Free)	Seaborn & Matplotlib
Course	Kaggle Feature Engineering Course (Free)	Feature engineering
Docs	Scikit-learn Preprocessing Guide	Scaling & encoding
Book	Hands-On ML with Scikit-Learn (Free Chapter 2)	End-to-end ML project

Week	Topics Covered	Daily Task / Mini-Project
W9	Label/one-hot encoding, feature binning, log transform, polynomial features, feature selection	Transform Titanic dataset. Apply log transform to skewed price column. One-hot encode city column.
W10	StandardScaler/MinMaxScaler, train_test_split (stratified), K-Fold CV, sklearn Pipeline, leakage demo	Build full preprocessing Pipeline. Demonstrate data leakage. Compare scores with/without CV.

Week	Topics Covered	Daily Task / Mini-Project
W11	Linear Regression, sklearn fit/predict API, MAE/MSE/RMSE/R², Multiple & Polynomial Regression	Predict house prices with linear regression. Compare MAE vs RMSE. Plot residuals.
W12	Ridge (L2) and Lasso (L1), Logistic Regression for binary classification, decision boundary	Apply Ridge vs Lasso — compare feature coefficients. Build spam classifier.

Type	Resource	Category
Course	Kaggle Intro to Machine Learning (Free)	Best beginner ML
Course	Andrew Ng — ML Specialization (Coursera, Audit Free)	Gold standard ML course
Video	StatQuest — Logistic Regression, Decision Trees, Random Forest (YouTube)	Intuition-first
Docs	Scikit-learn User Guide: Supervised Learning	Complete reference

Type	Resource	Category
Video	StatQuest — Random Forest, Gradient Boosting (YouTube)	Best visual explanation
Docs	XGBoost Documentation (Official)	XGBoost reference
Course	Kaggle ML Explainability Course (Free, SHAP)	Model interpretation
Course	Kaggle Intermediate ML: Pipelines, XGBoost (Free)	XGBoost practice

Type	Resource	Category
Tutorial	Anthropic Interactive Prompt Engineering Tutorial	Best hands-on prompting — 9 chapters
Docs	Anthropic Prompt Engineering Docs	Official reference
Docs	OpenAI Prompt Engineering Guide	OpenAI official guide
Guide	promptingguide.ai — basic to advanced strategies	Comprehensive guide

Type	Resource	Category
Course	DeepLearning.AI: Prompt Engineering for Developers	Prompt engineering course
Tool	Groq API — Free Fast LLM Inference (Llama, Mistral)	Free OpenAI alternative
Tool	Google Gemini API — Free Tier Available	Free LLM API

Type	Resource	Category
Article	Stack Overflow Blog: Intuitive Introduction to Text Embeddings	Best beginner explanation
Guide	HuggingFace: Getting Started With Embeddings	Hands-on embeddings
Docs	OpenAI Embeddings Guide	OpenAI embedding models
Docs	Chroma Official Docs — in-memory, no infrastructure	Best for prototyping
Docs	Qdrant Documentation — best open-source vector DB for production	Production vector DB
Repo	pgvector — vector search inside PostgreSQL	If already on Postgres

Type	Resource	Category
Guide	Weaviate: Chunking Strategies for RAG	Most practical guide
Docs	LangChain Text Splitters Docs	Code reference
Article	Unstructured: Chunking for RAG Best Practices	Technical deep-dive

Type	Resource	Category
Docs	Cohere Reranking Docs — single line of code	Best place to start reranking
Docs	LangChain: Query Transformations	Query rewriting, HyDE, step-back prompting
Guide	Pinecone: Improving Retrieval Quality	Common failure modes with fixes

Type	Resource	Category
Article	Anthropic: Building Effective Agents	Read this FIRST before writing any agent code
PDF	OpenAI: A Practical Guide to Building Agents	Agent patterns and guardrails
Course	LangChain Academy: Introduction to LangGraph (Free)	Official free LangGraph course
Docs	LangGraph State Management	State schemas reference

Type	Resource	Category
Docs	LlamaIndex: Introduction to RAG	Official RAG guide
Docs	LangChain: Build a RAG Agent	From minimal to full pipeline with reranking
Docs	Anthropic: Giving Claude Sources	Citation prompting
Course	DeepLearning.AI: LangChain for LLM App Dev (Free)	LangChain fundamentals
Course	DeepLearning.AI: Building Advanced RAG (Free)	Advanced RAG course

Type	Resource	Category
Docs	FastAPI Security Docs — OAuth2, JWT tokens, API keys	Auth reference
Guide	OWASP API Security Top 10	Broken auth, injection, mass assignment
Tool	Langfuse — open-source LLM observability. Free tier.	Best LLM observability
Tool	LangSmith — tracing for LangChain/LangGraph	LangChain tracing
Library	Structlog — structured JSON logging for Python	Production logging

Type	Resource	Category
Tool	Langfuse Prompt Management — versioning with rollback	Best prompt versioning
Tool	Helicone — proxy-based cost tracking, one line of code	Easiest cost monitoring
Library	LiteLLM — unified interface + budget management	Multi-provider management
Library	GPTCache — semantic caching	Semantic caching

Type	Resource	Category
Course	Made With ML — best free MLOps curriculum	Best free MLOps course
Docs	MLflow Documentation (Official)	Experiment tracking
Tool	Evidently AI — open-source model monitoring and drift	Data drift detection
Tool	Render.com — Free Docker App Hosting	Free deployment platform
Tool	HuggingFace Spaces — Free Hosting for ML Demos	Free ML demo hosting

Type	Resource	Category
Docs	Vercel AI SDK — AI-powered UIs with built-in streaming	Fastest AI UI framework
Docs	Streamlit — AI demos in pure Python	Best for MVPs and demos
Docs	Gradio — quick model interfaces, HuggingFace Spaces deploy	Quick prototyping
Guide	Google: People + AI Guidebook	Best AI UX resource
Guide	Nielsen Norman Group: AI UX Guidelines	Research-backed AI interface guidelines

Type	Resource	Category
Docs	OpenAI Fine-tuning Guide	Easiest fine-tuning start
Docs	HuggingFace Transformers Fine-tuning Tutorial	Open-source fine-tuning
Repo	Unsloth — 2x faster, 80% less memory	Fastest fine-tuning
Tool	Ollama — run open-source LLMs locally with one command	Local LLM experimentation
Library	vLLM — production inference, 2–4x faster	Production inference

Type	Resource	Category
Docs	n8n — visual AI automation, 400+ integrations, free to self-host	Best no-code AI automation
Docs	Temporal — durable workflows for fault-tolerant processes	Fault-tolerant automation
Docs	LangGraph: Multi-Agent Workflows	Code-first multi-agent orchestration

Type	Resource	Category
Tool	Kaggle — datasets, competitions, learning	Primary practice platform
Tool	HuggingFace — models, datasets, free hosting on Spaces	AI model hub
Tool	Google Colab — free GPU Jupyter notebooks	Free compute
Course	DeepLearning.AI Short Courses (all free)	GenAI specializations
Course	Google ML Crash Course	Core ML concepts
Course	course.fast.ai — practical deep learning top-down	Deep learning
Tool	Papers With Code — latest research with code	Stay current
Tool	MLflow — open source ML experiment tracking	ML experiment tracking
Docs	HuggingFace Transformers Documentation	Open-source LLMs
Community	AI Discord — EleutherAI & HuggingFace	Community learning

🤖 AI & Machine LearningComplete Learning Roadmap

Choose Your Learning Path

🤖 AI / GenAI Engineer Path

📊 Data Scientist / ML Engineer Path

Part 0 — How to Use This Roadmap

Reading Guide & Time Commitment

Role Comparison & Salary Guide

Math & Statistics Prerequisites

Part 1 — Universal Foundation

Module 1: Python Programming Fundamentals

Module 2: NumPy & Pandas Data Toolkit

Module 3: Developer Essentials — Git, CLI, APIs & Async

Module 4: SQL Basics & FastAPI

Part 2 — Statistics, EDA & ML Workflow

Module 5: Statistical Thinking & Data Visualization

Module 6: ML Workflow & Feature Engineering

Part 3 — Classical ML [Parallel Branch]

Module 7: Supervised Learning — Regression

Module 8–9: Classification & Ensemble Methods

Module 10: Unsupervised Learning

Part 4 — LLM API Mastery

Module 11: Prompting Fundamentals

Module 12: Structured Outputs & Tool Calling

Module 13: Streaming & Conversation State

Module 14: Reliability, Cost & Security

Part 5 — RAG Systems

Module 15: Embeddings & Vector Databases

Module 16: Chunking & Document Ingestion

Module 17: Retrieval Quality — Filtering, Reranking & Failure Modes

Module 18: RAG Pipelines, Grounding & Hallucination Reduction

Part 6 — Agents, Workflows & Evaluation

Module 19: Agent Loops & LangGraph

Module 20: Tool Design, Workflow Patterns & When NOT to Use Agents

Module 21: Failure Handling in Agents

Module 22: Evaluation Harnesses & Task Success Metrics

Part 7 — Production & Deployment

Module 23: FastAPI Production Patterns

Module 24: Docker & Background Jobs

Module 25: Auth, Logging & Observability

Module 26: Prompt Versioning, Cost Monitoring & Caching

Module 27: MLOps Foundations

Part 8 — Specialization & Launch

🚀 Track A — AI Product Engineer

🔬 Track B — Applied ML / LLM Engineer

⚙️ Track C — AI Automation Engineer

📊 Track D — Data Scientist / Analyst

Track A — AI Product Engineer

Track B — Applied ML / LLM Engineer

Track C — AI Automation Engineer

Track D — Data Scientist / Analyst

Part 9 — Capstone, Portfolio & Launch

GitHub Profile Setup

Portfolio Project Checklist

Job-Readiness Checklist

Essential Communities & Platforms

🤖 AI & Machine Learning
Complete Learning Roadmap