What This Module Covers
Core AI EngineeringIn real applications you almost never want raw text from an LLM — you want structured data you can parse, store, validate, and use in your code. This module covers two critical techniques for getting reliable structure out of LLMs:
- Structured outputs — forcing the model to return data that matches a Pydantic schema you define. Never parse free-text JSON again.
- Tool calling (function calling) — giving the model the ability to call your Python functions. This is what transforms an LLM from a text generator into a system that can take real actions.
These two techniques are the foundation of agents, RAG pipelines, and any AI system that needs to interact with the real world. Master them here before building anything complex.
Why Structured Outputs Matter
Motivation# The problem with raw text output response = call_claude("Extract the name, age, and city from: 'John is 28, lives in Mumbai'") # Response might be: # "The name is John, he is 28 years old, and he lives in Mumbai." # "Name: John Age: 28 City: Mumbai" # {"name": "John", "age": "28", "city": "Mumbai"} ← age is a string, not int! # {"name": "John", "age": 28} ← city missing! # You cannot reliably parse any of these # With structured outputs (Pydantic + Instructor) class Person(BaseModel): name: str age: int city: str person = extract(text, Person) print(person.age + 1) # 29 — it's always an int. Always present.
💡 Structured outputs solve three problems at once: type safety (age is always an int), completeness (required fields are always present), and consistency (same schema every time, regardless of how the model phrases its response).
OpenAI Native Structured Outputs
OpenAI OnlyOpenAI (gpt-4o and later) supports native structured outputs via response_format with a JSON schema. The model is guaranteed to return valid JSON matching your schema — it cannot deviate.
from openai import OpenAI
from pydantic import BaseModel
from typing import List, Optional
client = OpenAI()
class CalendarEvent(BaseModel):
name: str
date: str # ISO format: YYYY-MM-DD
participants: List[str]
location: Optional[str] = None
# Method 1: parse() helper — simplest approach
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[{
"role": "user",
"content": "Extract event: 'Meeting with Alice and Bob on 2024-03-15 at Bangalore office'"
}],
response_format=CalendarEvent,
)
event = completion.choices[0].message.parsed
print(event.name) # "Meeting"
print(event.participants) # ["Alice", "Bob"]
print(event.date) # "2024-03-15"
print(type(event)) # <class 'CalendarEvent'> — a real Python object
# Handle refusal (model refuses to comply with the request)
if completion.choices[0].message.refusal:
print(f"Model refused: {completion.choices[0].message.refusal}")JSON Mode vs Structured Outputs
Know the Difference| Feature | JSON Mode | Structured Outputs |
|---|---|---|
| Guarantees | Valid JSON only — no schema enforcement | Valid JSON matching exact schema |
| Missing fields | Can still omit required fields | Required fields always present |
| Wrong types | age can be "28" (string) | age is always int |
| Extra fields | Can add unexpected fields | Only schema fields returned |
| Use when | Quick prototyping, flexible schema | Production — any time you parse the output |
# JSON mode — just ensures valid JSON, not schema compliance response = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_object"}, # JSON mode messages=[{"role": "user", "content": "Extract name and age as JSON"}] ) data = json.loads(response.choices[0].message.content) # data["age"] might be "28" or 28 — you don't know
Complex Pydantic Schemas
Real-World Patternsfrom pydantic import BaseModel, Field from typing import List, Optional, Literal from enum import Enum # Nested models class Address(BaseModel): street: str city: str country: str postal_code: Optional[str] = None class Contact(BaseModel): name: str email: str phone: Optional[str] = None address: Optional[Address] = None # nested model # Enums for controlled vocabularies class Priority(str, Enum): LOW = "low" MEDIUM = "medium" HIGH = "high" URGENT = "urgent" class Ticket(BaseModel): title: str priority: Priority # must be one of 4 values tags: List[str] = [] assignee: Optional[Contact] = None # Discriminated unions — different schema per type class TextContent(BaseModel): type: Literal["text"] text: str class ImageContent(BaseModel): type: Literal["image"] url: str alt: Optional[str] = None from typing import Union, Annotated Content = Annotated[Union[TextContent, ImageContent], Field(discriminator="type")] class Post(BaseModel): title: str contents: List[Content] # can be text or image blocks
Instructor — Structured Outputs for Every Provider
Production StandardInstructor is the cleanest way to get structured outputs from any LLM provider using Pydantic models. It works with OpenAI, Anthropic, Google, HuggingFace, and 15+ others using the same code interface — and adds automatic retries when validation fails.
pip install instructor anthropic openai import instructor import anthropic from openai import OpenAI from pydantic import BaseModel from typing import List # ── With Anthropic (Claude) ──────────────────────────── claude_client = instructor.from_anthropic(anthropic.Anthropic()) class MovieReview(BaseModel): title: str rating: float # 1.0 to 10.0 pros: List[str] cons: List[str] recommended: bool review = claude_client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, messages=[{ "role": "user", "content": "Review the movie Interstellar" }], response_model=MovieReview, # ← Pydantic model as schema ) print(review.title) # "Interstellar" print(review.rating) # 9.2 — always a float print(review.recommended) # True — always a bool # ── With OpenAI (GPT-4o) ─────────────────────────────── oai_client = instructor.from_openai(OpenAI()) review = oai_client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Review Interstellar"}], response_model=MovieReview, # ← exact same code ) # Same API regardless of provider — easy to switch
Automatic Retries and Partial Extraction
Reliabilityimport instructor from instructor import Mode from pydantic import BaseModel, field_validator # Instructor retries automatically when validation fails client = instructor.from_anthropic( anthropic.Anthropic(), mode=Mode.ANTHROPIC_JSON, max_retries=3 # retry up to 3 times if schema not satisfied ) class StrictRating(BaseModel): score: float label: str @field_validator("score") @classmethod def must_be_in_range(cls, v: float) -> float: if not (1.0 <= v <= 10.0): raise ValueError(f"Score {v} must be between 1.0 and 10.0") return round(v, 1) @field_validator("label") @classmethod def must_be_valid_label(cls, v: str) -> str: valid = {"excellent", "good", "average", "poor"} if v.lower() not in valid: raise ValueError(f"Label must be one of {valid}") return v.lower() # If model returns score=11.0, Instructor catches the validation error, # tells the model what went wrong, and asks it to try again # Partial extraction — stream partial objects as they are generated from instructor import Partial class LargeReport(BaseModel): executive_summary: str key_findings: List[str] recommendations: List[str] conclusion: str # Stream partial object — UI can update progressively for partial_report in client.messages.create_partial( model="claude-3-5-sonnet-20241022", max_tokens=4096, messages=[{"role": "user", "content": "Generate a quarterly report"}], response_model=Partial[LargeReport], ): if partial_report.executive_summary: print(partial_report.executive_summary, end=" ")
💡 Automatic retries are Instructor's killer feature. When a field validator raises a ValueError, Instructor sends the model a message saying "Your previous response failed validation: [error]. Please fix and try again." The model almost always succeeds on the second attempt. This makes structured extraction production-ready.
Real-World Extraction Patterns
Production Use Cases# 1. Invoice parser class LineItem(BaseModel): description: str quantity: int unit_price: float total: float class Invoice(BaseModel): invoice_number: str vendor: str line_items: List[LineItem] subtotal: float tax_rate: float total: float due_date: str # YYYY-MM-DD # 2. Meeting notes → action items class ActionItem(BaseModel): task: str owner: str due_date: Optional[str] priority: Literal["high", "medium", "low"] class MeetingNotes(BaseModel): summary: str decisions: List[str] action_items: List[ActionItem] next_meeting: Optional[str] # 3. Job description parser class JobDescription(BaseModel): role: str company: str location: str salary_min: Optional[int] salary_max: Optional[int] required_skills: List[str] preferred_skills: List[str] years_experience: Optional[int] remote: bool # 4. Support ticket classifier class SupportTicket(BaseModel): category: Literal["billing", "technical", "account", "general"] priority: Literal["p1", "p2", "p3"] sentiment: Literal["frustrated", "neutral", "positive"] summary: str needs_human: bool
Tool Calling — The Mental Model
Critical ConceptTool calling is what transforms an LLM from a text generator into something that can take actions — search the web, query a database, call your API, run code. Before writing any code, understand what actually happens:
(JSON schemas)
which tool to call
tool_call object
the actual function
generates response
⚠️ The model does NOT execute your functions. It only returns a structured object saying "I want to call get_weather with city='Mumbai'". Your code reads that object and actually calls the function. This distinction is critical for security — you control what runs.
# What a tool call response looks like (Anthropic) { "type": "tool_use", "id": "toolu_01A09q90qw90lq917835lq9", "name": "get_weather", "input": { "city": "Mumbai", "units": "celsius" } } # YOUR code then calls: get_weather(city="Mumbai", units="celsius")
Defining Tools — The 5-Step Pattern
Core Patternimport anthropic import json client = anthropic.Anthropic() # STEP 1: Define your Python functions def get_weather(city: str, units: str = "celsius") -> dict: # In production: call a real weather API return {"city": city, "temp": 28, "condition": "sunny", "units": units} def calculate(expression: str) -> dict: try: result = eval(expression, {"__builtins__": {}}) # safe eval return {"result": result, "expression": expression} except Exception as e: return {"error": str(e)} # STEP 2: Describe the tools in JSON Schema tools = [ { "name": "get_weather", "description": "Get the current weather for a specific city. Use this when the user asks about weather, temperature, or conditions in a location.", "input_schema": { "type": "object", "properties": { "city": { "type": "string", "description": "The city name, e.g. 'Mumbai', 'Delhi', 'Bangalore'" }, "units": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit. Default: celsius" } }, "required": ["city"] } }, { "name": "calculate", "description": "Evaluate a mathematical expression. Use this for any arithmetic, percentage, or numeric calculation. Do NOT use this for non-math questions.", "input_schema": { "type": "object", "properties": { "expression": { "type": "string", "description": "A valid Python math expression, e.g. '(100 * 1.15) + 50'" } }, "required": ["expression"] } } ] # STEP 3: Send request with tools response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, tools=tools, messages=[{"role": "user", "content": "What's the weather in Mumbai? Also, what is 15% of 2500?"}] ) # STEP 4: Execute the tool calls tool_results = [] for block in response.content: if block.type == "tool_use": if block.name == "get_weather": result = get_weather(**block.input) elif block.name == "calculate": result = calculate(**block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": json.dumps(result) }) # STEP 5: Send results back to get final response final_response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's the weather in Mumbai? Also, 15% of 2500?"}, {"role": "assistant", "content": response.content}, {"role": "user", "content": tool_results} ] ) print(final_response.content[0].text)
Writing Tool Descriptions That Work
Critical SkillThe tool description is the model's user manual. A vague description leads to wrong tool selection. Be explicit about when to use the tool, not just what it does.
# BAD tool description — vague { "name": "search", "description": "Search for information", "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}} } # GOOD tool description — specific when/what/not { "name": "search_knowledge_base", "description": """Search the internal company knowledge base for product documentation, FAQs, and policy documents. Use this when the user asks about: - Product features or specifications - Company policies or procedures - Troubleshooting steps Do NOT use this for: general knowledge questions, math calculations, or anything not related to company products and policies.""", "input_schema": { "type": "object", "properties": { "query": { "type": "string", "description": "Natural language search query, e.g. 'How do I reset my password?'" }, "category": { "type": "string", "enum": ["products", "policies", "support"], "description": "Filter results by category. Optional." } }, "required": ["query"] } }
- Name — self-explanatory verb:
search_knowledge_basenotsearch - Description — explain WHEN to call (not just what), give examples, and state when NOT to use it
- Parameters — include examples in descriptions:
"e.g. 'Mumbai', 'Delhi'" - Required vs optional — mark truly optional params as optional with sensible defaults
OpenAI Tool Calling
Syntax Differencesfrom openai import OpenAI client = OpenAI() # OpenAI uses slightly different field names tools = [{ "type": "function", # required wrapper "function": { "name": "get_weather", "description": "Get current weather for a city", "parameters": { # "parameters" not "input_schema" "type": "object", "properties": { "city": {"type": "string"} }, "required": ["city"] } } }] response = client.chat.completions.create( model="gpt-4o", tools=tools, tool_choice="auto", # "auto" | "required" | "none" | specific tool messages=[{"role": "user", "content": "Weather in Mumbai?"}] ) # Parse tool calls message = response.choices[0].message if message.tool_calls: for tool_call in message.tool_calls: name = tool_call.function.name args = json.loads(tool_call.function.arguments) # Execute function based on name...
The Complete Tool Loop — Production Pattern
Productionimport anthropic, json from typing import Any client = anthropic.Anthropic() # Tool registry — maps name → function TOOL_REGISTRY = { "get_weather": get_weather, "calculate": calculate, "search_notes": search_notes, } def run_tool_loop(user_message: str, tools: list, max_turns: int = 10) -> str: """Run a complete tool loop until the model produces a final text response.""" messages = [{"role": "user", "content": user_message}] for turn in range(max_turns): response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=4096, tools=tools, messages=messages ) # Check stop reason if response.stop_reason == "end_turn": # Model finished — return text response for block in response.content: if hasattr(block, "text"): return block.text return "" if response.stop_reason != "tool_use": break # unexpected stop reason # Append assistant message messages.append({"role": "assistant", "content": response.content}) # Execute all tool calls tool_results = [] for block in response.content: if block.type != "tool_use": continue func = TOOL_REGISTRY.get(block.name) if func is None: result = {"error": f"Unknown tool: {block.name}"} else: try: result = func(**block.input) except Exception as e: result = {"error": str(e), "tool": block.name} tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": json.dumps(result) }) messages.append({"role": "user", "content": tool_results}) return "Max turns reached without final response" # Usage answer = run_tool_loop( "What's the weather in Mumbai and Delhi? Which city is warmer?", tools=tools ) print(answer)
tool_choice — Controlling Which Tool Gets Called
Control# Anthropic tool_choice options # "auto" (default) — model decides whether to use a tool or respond directly tool_choice={"type": "auto"} # "any" — model MUST call a tool (useful to force structured extraction) tool_choice={"type": "any"} # Specific tool — model MUST call this exact tool tool_choice={"type": "tool", "name": "extract_invoice"} # When to use each: # "auto" — conversational agents where tool use is optional # "any" — when you always need structured output (extraction pipelines) # specific — when you know exactly which tool to force (single-purpose endpoints) # OpenAI equivalents tool_choice = "auto" # let model decide tool_choice = "required" # must use a tool (= Anthropic "any") tool_choice = "none" # never use tools tool_choice = {"type": "function", "function": {"name": "get_weather"}} # force specific
Parallel Tool Calls
PerformanceModern models can call multiple tools in a single turn. This is dramatically faster than sequential calls — instead of 3 round trips to the API, you do 1.
# The model may return multiple tool_use blocks in one response response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, tools=tools, messages=[{"role": "user", "content": "Get weather for Mumbai, Delhi, and Bangalore"}] ) # response.content may contain 3 tool_use blocks simultaneously # Execute all of them, then send all results back at once import asyncio async def execute_tool_calls_parallel(tool_calls: list) -> list: """Execute multiple tool calls concurrently.""" async def execute_one(block) -> dict: func = TOOL_REGISTRY.get(block.name) result = await asyncio.to_thread(func, **block.input) return { "type": "tool_result", "tool_use_id": block.id, "content": json.dumps(result) } return await asyncio.gather(*[execute_one(b) for b in tool_calls])
💡 Parallel tool calls matter for agents. An agent researching 5 topics simultaneously via search tools is 5× faster than one that searches sequentially. Always process all tool_use blocks in a single response together, not one by one.
FREE LEARNING RESOURCES
| Type | Resource | Best For |
|---|---|---|
| Docs | OpenAI Structured Outputs Guide — platform.openai.com | Covers the feature that ensures models always generate responses adhering to your JSON Schema. |
| Library | Instructor library — python.useinstructor.com | The cleanest way to get structured outputs from any LLM provider. Production standard. |
| Docs | OpenAI Function Calling Guide — platform.openai.com | Definitive reference for tool calling with OpenAI models. |
| Docs | Anthropic Tool Use Docs — docs.anthropic.com | Anthropic's complete guide to tool calling with Claude. |
| Notebook | OpenAI Cookbook: How to Call Functions — github.com/openai/openai-cookbook | Complete runnable notebook walking through the full tool-calling loop with real examples. |
MILESTONE PROJECT
Part A — Invoice Parser: Use Instructor to extract structured data from raw invoice text.
- Define a full Invoice Pydantic model: invoice_number, vendor, line_items (list), subtotal, tax_rate, total, due_date
- Test on 5 different invoice text formats (different layouts, missing fields, different currencies)
- Add field validators: total must equal subtotal * (1 + tax_rate), due_date must be valid ISO date
- Observe Instructor's automatic retry behaviour when validation fails
Part B — 3-Tool Assistant: Build a conversational assistant with three callable tools.
get_weather(city)— calls Open-Meteo API (no key needed)calculate(expression)— evaluates math expressions safelysearch_notes(query)— searches a hardcoded dict of notes by keyword- Implement the full 5-step tool loop with parallel execution
- Test with: "What's the weather in Mumbai?", "What is 15% of 8500?", "Find notes about Python", "What's the weather in Delhi and Mumbai, and which is warmer?" (parallel)
Skills: Pydantic, Instructor, field validators, Anthropic/OpenAI SDK, tool calling loop, parallel tool execution
Structured Extraction — Compare JSON Mode vs Instructor
Objective: Directly observe what structured outputs guarantee vs what JSON mode does not.
Tool Description Quality — See How It Affects Selection
Objective: Empirically measure how tool description quality affects which tool the model selects.
Build and Test the Complete Tool Loop
Objective: Build the complete production tool loop and test every edge case.
run_tool_loop() function from Tab 4 with the 3 tools (weather, calculate, search_notes).P4-M12 MASTERY CHECKLIST
- Can explain the difference between JSON mode and structured outputs — and when each is appropriate
- Can define a Pydantic model with nested objects, enums, Optional fields, and List fields
- Can use Instructor with both Anthropic and OpenAI clients using the same Pydantic model
- Can add field validators to Pydantic models and understand how Instructor handles validation failures
- Understand the tool calling mental model: the model does NOT execute functions — it returns structured call objects
- Can write a tool definition with a description that clearly states when to use and when not to use it
- Can implement the complete 5-step tool calling loop for Anthropic (Claude)
- Can implement the equivalent for OpenAI (note the different field names)
- Know what tool_choice options exist and when to use "auto" vs "any" vs specific tool
- Can handle parallel tool calls — processing all tool_use blocks before the next API call
- Can implement a production tool loop with max_turns, error handling, and tool registry
- Know that better tool descriptions (with when/when-not-to examples) produce more reliable tool selection
- Completed Lab 1: JSON mode vs Instructor comparison
- Completed Lab 2: tool description quality experiment
- Completed Lab 3: complete tool loop with all edge cases tested
- Milestone project pushed to GitHub with README
✅ When complete: Move to P4-M13 — Streaming & Conversation State. The tool calling patterns you built here are the foundation of agents in Part 6 — agents are just tool loops with more sophisticated decision logic.