P6-M19 - Agent Loops & LangGraph

Part 6 — Agents, Workflows & Evaluation · Module 19 of 22

Agent Loops & LangGraph

Build LLM systems that reason, act, and iterate — from scratch and with LangGraph

⏱ 1 Week 🟠 Intermediate–Advanced 🔧 LangGraph · Anthropic SDK 📋 Prerequisite: P4-M12 (Tool Calling)

🎯

What This Module Covers

Core of Part 6

An agent is an LLM that decides what to do next by choosing from a set of tools, executes those tools, observes results, and repeats until it completes a goal — or knows it cannot. This module teaches you to build agents from scratch and with LangGraph.

Agent mental model — what separates an agent from a chain; the think-act-observe loop
ReAct loop from scratch — Reasoning + Acting pattern, fully implemented without a framework
State management — how agents track what they know and what they have done
LangGraph — state schemas, nodes, edges, conditional routing, checkpointing
Human-in-the-loop — pausing for approval before consequential tool calls
Multi-turn agent conversations — maintaining context across user interactions

🧠

What Is an Agent?

Concept First

The word "agent" is overloaded. Here is the precise definition: an agent is an LLM that is given tools and a goal, and then decides for itself which tools to call, in what order, with what arguments — until it determines the goal is achieved.

# NOT an agent — you decide what to call:
weather = get_weather("Mumbai")       # you chose to call this
summary = summarise(weather)          # you chose to call this next

# IS an agent — the LLM decides what to call:
# User: "Should I carry an umbrella in Mumbai today?"
#
# LLM thinks: I need weather data → calls get_weather("Mumbai")
# LLM observes: {"temp": 28, "condition": "partly cloudy", "rain_chance": 20%}
# LLM thinks: 20% chance of rain — not high. I have enough to answer.
# LLM responds: "Probably not necessary, but a light one wouldn't hurt."
#
# The LLM made ALL the decisions. You only provided tools and a question.

🧠 LLM: Think about what to do next

↓

Decision: need tool? or have final answer?

↓ need tool

⚙️ Execute tool call — your code runs

↓ tool result

🧠 LLM: Observe result, think again

↓ final answer

✅ Return answer to user

⚠️ Agents are not always the right tool. A deterministic chain (M18 RAG pipeline) is more predictable, cheaper, and easier to debug. Use an agent when the task requires dynamic decision-making — the sequence of steps cannot be known in advance.

🔄

ReAct Loop from Scratch

Build to Understand

ReAct (Reasoning + Acting) is the foundational agent pattern. Before using any framework, build it from scratch.

import anthropic, json
from typing import Any

client = anthropic.Anthropic()

# ── Tool definitions ──────────────────────────────────
def search_web(query: str) -> str:
    return f"Search results for '{query}': [simulated results about {query}]"

def calculate(expression: str) -> str:
    try:
        result = eval(expression, {"__builtins__": {}})
        return f"{expression} = {result}"
    except Exception as e:
        return f"Error: {e}"

def get_current_time() -> str:
    from datetime import datetime
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")

TOOLS = [
    {"name": "search_web",
     "description": "Search the web for current information. Use when you need facts not in your training data.",
     "input_schema": {"type": "object", "properties": {
         "query": {"type": "string", "description": "The search query"}},
         "required": ["query"]}},
    {"name": "calculate",
     "description": "Evaluate a mathematical expression. Use for any arithmetic.",
     "input_schema": {"type": "object", "properties": {
         "expression": {"type": "string"}}, "required": ["expression"]}},
    {"name": "get_current_time",
     "description": "Get the current date and time.",
     "input_schema": {"type": "object", "properties": {}}},
]

TOOL_REGISTRY = {"search_web": search_web,
                 "calculate": calculate,
                 "get_current_time": get_current_time}

# ── ReAct agent loop ──────────────────────────────────
def run_agent(user_message: str, system: str = "",
              max_turns: int = 10) -> str:
    messages = [{"role": "user", "content": user_message}]

    for turn in range(max_turns):
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=4096,
            system=system,
            tools=TOOLS,
            messages=messages
        )

        # Agent finished — return final text
        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return ""

        # Agent wants to use tools
        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})

            tool_results = []
            for block in response.content:
                if block.type != "tool_use":
                    continue
                func = TOOL_REGISTRY.get(block.name)
                if func is None:
                    result = {"error": f"Unknown tool: {block.name}"}
                else:
                    try:
                        result = func(**block.input)
                    except Exception as e:
                        result = {"error": str(e)}

                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result)
                })

            messages.append({"role": "user", "content": tool_results})

    return f"Agent reached max_turns ({max_turns}) without completing."

# Run the agent
answer = run_agent(
    "What is the square root of 1764, and what day of the week is it today?"
)
print(answer)

🗂

Agent State — What the Agent Knows

Architecture

State is everything the agent needs to track across turns: the conversation history, tool results, intermediate data, and decisions made. Designing state well determines how complex your agent can become.

from dataclasses import dataclass, field
from typing import Any, Optional
from datetime import datetime

@dataclass
class AgentState:
    # Core
    messages:      list[dict]       = field(default_factory=list)
    turn_count:    int              = 0
    started_at:    str              = field(default_factory=lambda: datetime.utcnow().isoformat())

    # Tool tracking
    tools_called:  list[str]        = field(default_factory=list)
    tool_results:  dict[str, Any]   = field(default_factory=dict)

    # Working memory — agent can store intermediate findings
    scratch_pad:   dict[str, Any]   = field(default_factory=dict)

    # Task tracking
    goal:          str              = ""
    subtasks:      list[str]        = field(default_factory=list)
    completed:     list[str]        = field(default_factory=list)
    status:        str              = "running"   # running | waiting | done | failed

    # Human interaction
    awaiting_approval: bool         = False
    pending_action:    Optional[dict] = None

def agent_with_state(user_message: str, state: AgentState = None) -> AgentState:
    if state is None:
        state = AgentState(goal=user_message)

    state.messages.append({"role": "user", "content": user_message})

    while state.status == "running" and state.turn_count < 10:
        state.turn_count += 1
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=4096,
            tools=TOOLS,
            messages=state.messages
        )

        if response.stop_reason == "end_turn":
            state.status = "done"
            for block in response.content:
                if hasattr(block, "text"):
                    state.messages.append({"role": "assistant", "content": block.text})
            break

        state.messages.append({"role": "assistant", "content": response.content})

        tool_results = []
        for block in response.content:
            if block.type != "tool_use":
                continue
            state.tools_called.append(block.name)
            result = TOOL_REGISTRY.get(block.name, lambda **k: "unknown tool")(**block.input)
            state.tool_results[block.id] = result
            tool_results.append({"type": "tool_result",
                                  "tool_use_id": block.id, "content": str(result)})

        state.messages.append({"role": "user", "content": tool_results})

    return state

🔀

LangGraph — Stateful Agent Graphs

Framework

LangGraph models agents as graphs: nodes (functions that transform state), edges (connections between nodes), and conditional edges (routes based on current state). It adds persistence, checkpointing, and human-in-the-loop out of the box.

pip install langgraph langchain-anthropic

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
from typing import TypedDict, Annotated
import operator

# ── 1. Define state schema ────────────────────────────
class AgentState(TypedDict):
    messages: Annotated[list, operator.add]   # reducer: add new messages

# ── 2. Define nodes ───────────────────────────────────
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
llm_with_tools = llm.bind_tools(langchain_tools)   # tools bound to LLM

def call_llm(state: AgentState) -> dict:
    """LLM node — decides what to do next."""
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

def execute_tools(state: AgentState) -> dict:
    """Tool node — executes all pending tool calls."""
    last_message = state["messages"][-1]
    tool_results = []
    for tool_call in last_message.tool_calls:
        func  = TOOL_REGISTRY[tool_call["name"]]
        result = func(**tool_call["args"])
        tool_results.append(ToolMessage(
            content=str(result),
            tool_call_id=tool_call["id"]
        ))
    return {"messages": tool_results}

# ── 3. Conditional edge — route based on state ────────
def should_continue(state: AgentState) -> str:
    """Return 'tools' if LLM wants to call tools, 'end' if done."""
    last = state["messages"][-1]
    if hasattr(last, "tool_calls") and last.tool_calls:
        return "tools"
    return "end"

# ── 4. Build the graph ────────────────────────────────
graph = StateGraph(AgentState)

graph.add_node("llm",   call_llm)
graph.add_node("tools", execute_tools)

graph.set_entry_point("llm")
graph.add_conditional_edges("llm", should_continue, {
    "tools": "tools",
    "end":   END
})
graph.add_edge("tools", "llm")   # after tools, always go back to LLM

# ── 5. Compile with checkpointer (persistence) ────────
memory  = MemorySaver()
agent   = graph.compile(checkpointer=memory)

# ── 6. Run the agent ──────────────────────────────────
config  = {"configurable": {"thread_id": "session-123"}}
result  = agent.invoke(
    {"messages": [HumanMessage(content="What is 15% of 8500 and what is today's date?")]},
    config=config
)
print(result["messages"][-1].content)

💡 The thread_id in config enables multi-session persistence. Every invocation with the same thread_id continues from where it left off — the graph state is checkpointed automatically. Different users get different thread_ids and completely isolated state.

📊

LangGraph State Reducers — Advanced Patterns

Power Feature

# Reducers control how state is updated when nodes return new values

from typing import TypedDict, Annotated
import operator

class ResearchState(TypedDict):
    # operator.add — appends new items to existing list
    messages:    Annotated[list, operator.add]
    sources:     Annotated[list, operator.add]

    # No reducer — node's returned value REPLACES current value
    current_task: str
    is_complete:  bool

    # Custom reducer — keep only last 10 messages
    short_memory: Annotated[list, lambda old, new: (old + new)[-10:]]

# Parallel nodes — execute concurrently in the graph
graph.add_node("search",    search_node)
graph.add_node("calculate", calc_node)
# Both run in parallel when the graph reaches this fork
graph.add_edge("start", "search")
graph.add_edge("start", "calculate")
# Both must complete before proceeding
graph.add_edge(["search", "calculate"], "synthesize")

🧑‍💻

Human-in-the-Loop — Pause Before Consequential Actions

Safety Critical

Never let an agent autonomously send emails, delete data, make purchases, or call external APIs without human approval. LangGraph's interrupt mechanism pauses the graph at any node, waits for human input, then resumes.

from langgraph.graph import StateGraph, END, interrupt

# ── Interrupt before executing a tool ─────────────────
SENSITIVE_TOOLS = {"send_email", "delete_record", "make_payment"}

def execute_tools_with_approval(state: AgentState) -> dict:
    last_message = state["messages"][-1]
    tool_results = []

    for tool_call in last_message.tool_calls:
        tool_name = tool_call["name"]

        if tool_name in SENSITIVE_TOOLS:
            # Pause and ask human for approval
            approval = interrupt({
                "question": f"Agent wants to call {tool_name} with args: {tool_call['args']}. Approve?",
                "tool_name": tool_name,
                "tool_args": tool_call["args"]
            })
            if not approval.get("approved"):
                tool_results.append(ToolMessage(
                    content="User declined this action.",
                    tool_call_id=tool_call["id"]
                ))
                continue

        # Approved or non-sensitive — execute
        result = TOOL_REGISTRY[tool_name](**tool_call["args"])
        tool_results.append(ToolMessage(content=str(result),
                                        tool_call_id=tool_call["id"]))

    return {"messages": tool_results}

# ── Resuming after human approval ─────────────────────
# When the graph is interrupted, it returns a snapshot
# The human inspects and provides a response
# Then you resume with Command(resume=response)

from langgraph.types import Command

# Graph pauses here, returns to caller
result = agent.invoke(task, config)

# Human reviews the interrupt value
pending = result["__interrupt__"]
print(f"Waiting for approval: {pending[0]['value']}")

# Human approves (or rejects)
human_response = {"approved": True}   # or False

# Resume the graph from where it paused
result = agent.invoke(Command(resume=human_response), config)

⚠️ Human-in-the-loop is not optional for consequential actions. An agent that autonomously sends emails, deletes records, or makes API calls is an accident waiting to happen. Always implement interrupt-based approval for irreversible or high-stakes tool calls (OWASP LLM08: Excessive Agency).

FREE LEARNING RESOURCES

Type	Resource	Best For
Docs	LangGraph Low-Level Concepts — langchain-ai.github.io/langgraph	State schemas, reducers, nodes, edges, checkpointing — the definitive reference.
Course	LangChain Academy: Intro to LangGraph — academy.langchain.com	Free official LangGraph course. Hands-on with real agent examples. Best starting point.
Article	Anthropic: Building Effective Agents — anthropic.com/research	Anthropic's definitive guide on when to use agents vs workflows, and how to build reliable agents.
Docs	LangGraph: Multi-Agent Concepts — langchain-ai.github.io/langgraph	Supervisor patterns, handoff between agents, shared state in multi-agent systems.

🛠 Research Agent — Scratch + LangGraph [Intermediate–Advanced] 3–4 days

Build a research agent that can search, calculate, and synthesise multi-step answers — first from scratch, then rebuilt with LangGraph to compare the approaches.

Part A — From Scratch

Implement the full ReAct loop with 4 tools: search_web, calculate, get_current_time, read_file
Track state: tools_called, turn_count, intermediate_results
Add max_turns safeguard and meaningful error messages
Test with 5 multi-step queries that require 2+ tool calls each

Part B — LangGraph

Rebuild with LangGraph: StateGraph, MemorySaver, conditional edges
Add human-in-the-loop: interrupt before any web search (simulating a gated tool)
Test session persistence: run 3 turns, restart Python process, resume with same thread_id
Compare: what did LangGraph give you for free vs scratch?

Skills: ReAct loop, AgentState, LangGraph StateGraph, MemorySaver, interrupt/resume, conditional routing

LAB 1

Build and Break a ReAct Agent

Objective: Build deep intuition for agent behaviour by deliberately breaking it and observing failures.

Implement the scratch ReAct loop from Tab 2 with 3 tools (search, calculate, get_time).

Test with a 5-step query: "What is today's date? What was the population of India in that year? What is 2.3% of that number?" — observe the full tool-calling sequence.

Deliberately trigger each failure mode: (a) set max_turns=2 on a 4-step task, (b) make a tool return an error string, (c) give contradictory tool results — how does the agent handle each?

Remove one tool the agent needs mid-task. What happens? Does it give up gracefully or loop?

Add logging: print every turn number, stop_reason, and tools called. Run 5 different queries and compare turn counts. Which queries take the most turns and why?

LAB 2

LangGraph — Visualise and Trace the Graph

Objective: Build a LangGraph agent and use its tracing to deeply understand the execution path.

Build the simple 2-node LangGraph (llm → tools → llm) from Tab 4. Draw the graph: print(agent.get_graph().draw_mermaid()) — paste into mermaid.live to visualise.

Add a third node: validate_output — after the LLM produces a final answer, this node checks it meets quality criteria. Add a conditional edge: if quality check fails, route back to LLM; if passes, route to END.

Run with verbose streaming: for event in agent.stream(inputs, config): print(event). Observe every state transition.

Test checkpoint persistence: run 3 turns with a thread_id, then: snapshot = agent.get_state(config). Print the snapshot. Kill the Python process, restart, restore from snapshot, continue.

LAB 3

Human-in-the-Loop — Approval Flow

Objective: Implement and test the full interrupt/resume cycle for a gated tool.

Add a send_email(to, subject, body) tool to your LangGraph agent. Mark it as SENSITIVE.

Ask the agent: "Draft and send an email to boss@example.com explaining that the DPDK migration is complete." It should reach the send_email tool call and pause.

Inspect the interrupt value — does it contain the full email content? Approve it: Command(resume={"approved": True}). Verify the agent completes.

Repeat but reject with feedback: Command(resume={"approved": False, "reason": "Subject line too informal"}). Does the agent revise and ask again?

Document: What information should always be in the interrupt payload to give a human enough context to approve or reject? Design the ideal approval UI payload.

P6-M19 MASTERY CHECKLIST

Can explain the difference between a chain (you decide what runs) and an agent (LLM decides) in one sentence
Know when NOT to use an agent — predictable tasks with fixed steps should be chains
Can implement the full ReAct loop from scratch: LLM call → tool execution → result feeding → repeat
Correctly handle the max_turns safeguard to prevent infinite loops
Can design an AgentState dataclass that tracks messages, tools_called, tool_results, and status
Can define a LangGraph StateGraph with two nodes (llm, tools) and a conditional edge
Understand state reducers: operator.add appends to lists, no reducer replaces the value
Can compile a LangGraph agent with MemorySaver for session persistence across invocations
Understand thread_id: same thread_id = continued conversation; different = new session
Can add a third node (e.g. validator) and route back with a conditional edge
Can implement human-in-the-loop using LangGraph's interrupt() in a tool node
Can resume a paused graph using Command(resume=response)
Know that SENSITIVE_TOOLS (send email, delete, pay) must always require human approval (OWASP LLM08)
Completed Lab 1: ReAct agent built and deliberately broken to understand failure modes
Completed Lab 2: LangGraph with visualisation and checkpoint tracing
Completed Lab 3: full interrupt/resume approval flow
Milestone project pushed to GitHub: research agent in both scratch and LangGraph

✅ When complete: Move to P6-M20 — Tool Design, Workflow Patterns & When NOT to Use Agents. You now know how agents work mechanically. M20 teaches you to design agents that are reliable in production — which is harder than it looks.

← P5-M18: RAG Pipelines 🗺️ All Modules Next: P6-M20 — Tool Design →