Home / AI Arena / Building AI Agents / Agent Architectures

Agent Architectures

11 min read ai-arena LangGraph Agents Python

This is part of the AI Agents series. All code is at github.com/achintmehta/langchain. All agent examples are in the agents/ directory and target LM Studio on http://localhost:1234/v1.

From chains to agents

A chain is a fixed sequence of steps. An agent is different: the LLM itself decides what to do next. It can call a tool, inspect the result, decide whether that was enough or whether it needs to do something else, and keep going until it is satisfied.

LangGraph models this as a state machine. You define:

State: a typed dict (or Pydantic model) that holds everything the agent needs to know: the conversation history, intermediate results, flags.
Nodes: Python functions that take the current state, do something (call an LLM, run a tool, make a decision), and return an update to the state.
Edges: connections between nodes. A regular edge always goes from node A to node B. A conditional edge lets the LLM (or a Python function) decide which node to go to next.

The spectrum from "deterministic code" to "fully autonomous agent" looks like this:

Level	Description
Code	Regular software, no LLM involved
LLM call	Single LLM call embedded in a larger application
Chain	Fixed sequence of LLM calls
Router	LLM picks the next step from predefined options
Agent	LLM plans, acts, observes, and decides whether to keep going
Multi-agent	Multiple specialised agents, coordinated by a supervisor or graph

As you move right, the system gains autonomy but becomes harder to predict. Each step requires progressively more investment in observability, testing, and safety measures.

The Plan-Do Loop (ReAct)

The simplest agent pattern is the ReAct loop (Reasoning + Acting). The LLM looks at the current state and either calls a tool or produces a final answer. If it calls a tool, the graph runs the tool and gives the result back to the LLM. This repeats until the LLM decides it is done.

The full code is in agents/plan_do_loop.py.

from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"The weather in {city} is sunny and 22°C."

tools = [get_weather]
llm_with_tools = llm.bind_tools(tools)

def model_node(state: MessagesState):
    return {"messages": [llm_with_tools.invoke(state["messages"])]}

builder = StateGraph(MessagesState)
builder.add_node("model", model_node)
builder.add_node("tools", ToolNode(tools))

builder.add_edge(START, "model")
builder.add_conditional_edges("model", tools_condition)   # routes to "tools" or END
builder.add_edge("tools", "model")

graph = builder.compile()

MessagesState is a built-in LangGraph state type that holds a list of messages and automatically appends new messages rather than replacing the whole list. ToolNode is a prebuilt node that reads whatever tool calls are in the last message, executes them, and returns the results as ToolMessage objects. tools_condition is a prebuilt conditional edge that routes to the tools node if the last message contains a tool call, or to END otherwise.

The result is a loop: START → model → tools → model → tools → ... → END. The loop terminates when the model produces a message with no tool call.

Reflection Agent

The reflection pattern uses two LLM calls: a generator that produces a draft, and a critic that reviews it. The generator revises based on the critique, and the loop continues for a fixed number of iterations.

The code is in agents/reflect.py. The clever trick here is role inversion: the essay (an AIMessage) is presented to the critic as a HumanMessage. From the critic's perspective, it is seeing a student submission and responding as a teacher, which is exactly the relationship you want.

from langchain_core.messages import AIMessage, HumanMessage

def generate_node(state):
    response = generator_llm.invoke(state["messages"])
    return {"messages": [response]}

def reflect_node(state):
    # Flip AI messages to Human and Human to AI for the critic's perspective
    flipped = []
    for msg in state["messages"]:
        if isinstance(msg, AIMessage):
            flipped.append(HumanMessage(content=msg.content))
        elif isinstance(msg, HumanMessage):
            flipped.append(AIMessage(content=msg.content))
    critique = critic_llm.invoke(flipped)
    return {"messages": [HumanMessage(content=critique.content)]}

def should_continue(state):
    # Stop after 3 generate-reflect cycles (6 messages after the initial prompt)
    return END if len(state["messages"]) > 6 else "reflect"

builder = StateGraph(MessagesState)
builder.add_node("generate", generate_node)
builder.add_node("reflect", reflect_node)
builder.add_edge(START, "generate")
builder.add_conditional_edges("generate", should_continue)
builder.add_edge("reflect", "generate")

Reflection is useful for tasks where quality matters more than speed: writing, code generation, analysis. The number of iterations is a trade-off between output quality and latency and cost.

Supervisor Agent

For more complex tasks, you can have multiple specialised agents, a researcher, a coder, an analyst, coordinated by a supervisor. The supervisor sees the task and the current state and decides which agent should act next.

The code is in agents/supervisor_agent.py. The key idea is that the supervisor uses structured output to return its routing decision. LangGraph then reads the decision from state to determine the next node.

from typing import Literal
from pydantic import BaseModel

class SupervisorDecision(BaseModel):
    next: Literal["researcher", "coder", "FINISH"]

structured_supervisor = supervisor_prompt | llm.with_structured_output(SupervisorDecision)

def supervisor_node(state):
    decision = structured_supervisor.invoke(state)
    return {"next": decision.next}

builder = StateGraph(AgentState)
builder.add_node("supervisor", supervisor_node)
builder.add_node("researcher", researcher_node)
builder.add_node("coder", coder_node)

builder.add_edge(START, "supervisor")
builder.add_conditional_edges("supervisor", lambda state: state["next"],
    {"researcher": "researcher", "coder": "coder", "FINISH": END})
builder.add_edge("researcher", "supervisor")
builder.add_edge("coder", "supervisor")

The flow is: supervisor decides who should act → that agent acts → control returns to the supervisor. The supervisor can choose FINISH at any point to end the workflow. This topology works well for tasks that naturally decompose into distinct specialisations where one agent's output feeds into another's.

Subgraphs

As your agent grows, you will want to decompose it into reusable sub-graphs, encapsulated sub-workflows that can be embedded as a single node in a parent graph.

There are two cases, covered in agents/subgraph_with_shared_state.py and agents/subgraph_with_non_shared_state.py.

Shared state: the subgraph's state type inherits from the parent's. You add the compiled subgraph directly as a node:

class SubgraphState(ParentState):
    subgraph_specific_field: str

subgraph = subgraph_builder.compile()
parent_builder.add_node("my_subgraph", subgraph)  # subgraph as a node

Non-shared state: the parent and subgraph have completely different state schemas. You wrap the subgraph call in a Python function that manually maps state in and out:

def call_subgraph_node(state: ParentState) -> dict:
    subgraph_input = {
        "task_name": state["parent_job_id"],
        "data": state["raw_data"]
    }
    result = subgraph.invoke(subgraph_input)
    return {"extracted_data": result["final_output"]}

parent_builder.add_node("my_subgraph", call_subgraph_node)

The non-shared approach is more flexible, the subgraph is a black box with its own independent state, and the parent just sees its outputs.

Dynamic Tool Selection

When an agent has many tools, sending all of them in every prompt has two costs: it consumes context window tokens, and it can confuse the model with too many choices. Dynamic tool selection uses a vector store to retrieve only the tools most relevant to the current query, then binds only those tools to the LLM for this step.

The code is in agents/tool_selection.py.

from langchain_core.vectorstores import InMemoryVectorStore
from langchain_core.documents import Document

# Index tools by their description
tool_docs = [
    Document(page_content=t.description, metadata={"name": t.name})
    for t in all_tools
]

tools_store = InMemoryVectorStore.from_documents(tool_docs, embeddings_model)
tools_retriever = tools_store.as_retriever(search_kwargs={"k": 2})
tools_by_name = {t.name: t for t in all_tools}

def select_tools_node(state):
    # Retrieve the 2 most relevant tools for the current message
    query = state["messages"][-1].content
    relevant_tool_docs = tools_retriever.invoke(query)
    selected_tools = [tools_by_name[doc.metadata["name"]] for doc in relevant_tool_docs]
    return {"selected_tools": selected_tools}

def model_node(state):
    # Only bind the selected tools
    llm_with_selected = llm.bind_tools(state["selected_tools"])
    return {"messages": [llm_with_selected.invoke(state["messages"])]}

This is particularly valuable when you have a large, general-purpose tool library (dozens of tools) and want to avoid overwhelming the model with irrelevant options on every turn.

Human-in-the-Loop

LangGraph's checkpointing system allows you to pause execution mid-graph, inspect the state, optionally modify it, and then resume. This is what makes human approval workflows practical.

The code is in agents/human_in_the_loop.py.

Setup: attach a checkpointer and declare which nodes to interrupt before:

from langgraph.checkpoint.memory import MemorySaver

graph = builder.compile(
    checkpointer=MemorySaver(),
    interrupt_before=["tools"]   # pause before any tool call
)

config = {"configurable": {"thread_id": "session-1"}}

Run until the interrupt:

for event in graph.stream({"messages": [HumanMessage(content="Search for recent AI news")]}, config):
    print(event)
# Execution pauses before the tools node

Inspect what the agent wants to do:

state = graph.get_state(config)
pending_tool_call = state.values["messages"][-1].tool_calls[0]
print(pending_tool_call)  # {"name": "web_search", "args": {"query": "recent AI news"}}

Option 1, approve: resume with None as the input to continue from where it paused:

for event in graph.stream(None, config):
    print(event)

Option 2, modify: inject a custom tool result and skip the actual execution:

from langchain_core.messages import ToolMessage

graph.update_state(
    config,
    {"messages": [ToolMessage(
        content="Here is the custom result I want to inject.",
        tool_call_id=pending_tool_call["id"],
        name=pending_tool_call["name"]
    )]},
    as_node="tools"   # tell LangGraph this update comes from the tools node
)
for event in graph.stream(None, config):
    print(event)

Time travel: because every step creates a checkpoint, you can retrieve the full execution history and re-run from any past state:

for snapshot in graph.get_state_history(config):
    print(snapshot.config["configurable"]["checkpoint_id"])
    print(snapshot.next)

# Re-run from a specific past checkpoint
old_config = {"configurable": {"thread_id": "session-1", "checkpoint_id": "some-past-id"}}
for event in graph.stream(None, old_config):
    print(event)

This is invaluable for debugging, you can re-play any branch of execution with a different input without re-running the whole workflow from scratch.

Streaming in LangGraph

LangGraph supports three streaming modes that give you different views into what is happening inside the graph:

# Default, each node's output delta after it runs
for chunk in graph.stream(input, config, stream_mode="updates"):
    print(chunk)

# Full state snapshot after each node
for chunk in graph.stream(input, config, stream_mode="values"):
    print(chunk)

# Low-level debug events including checkpoint creation
for chunk in graph.stream(input, config, stream_mode="debug"):
    print(chunk)

For production UIs, updates mode is typically what you want, you see each node's contribution as it completes, which lets you show progress without waiting for the entire graph to finish.

What's next

The next part covers guardrails, how to validate and filter inputs and outputs to prevent your agent from saying or doing something it shouldn't.