Build a ReAct Agent
A agent is a while loop with three lines in the body: ask the model what to do next, run the tool it picked, feed the result back. Wrap that in 80 lines of Python and you have an agent that can search the web, run code, and edit files. Wrap it in LangGraph and you have a state machine with checkpoints and replay. Wrap it in Claude Desktop or Cursor and you have a product. The loop is the same.
is the primitive — the model emits a typed tool_use block, your code dispatches it to a function. ReAct is the loop that strings tool calls together with reasoning text in between, so the model has somewhere to think between actions. Yao et al. showed in 2022 that interleaved thought + action beats pure-action prompting on every multi-step benchmark; every modern agent — Cursor, Claude Code, Replit Agent, Devin, deep research — descends from that paper. Build the loop from scratch once and every agent codebase becomes legible: ah, that’s the loop, those are the tools. This lesson is that build, plus the failure modes that bite production stacks.
TL;DR
- ReAct (Yao et al., 2022) = Reason + Act. The agent alternates between thinking (free-text reasoning), acting (calling a tool), and observing (reading the tool’s output). Same pattern, every modern LLM agent.
- The architecture is small: an LLM, a list of tools (functions), a loop. A working ReAct agent is ~80 lines of Python, no LangChain required.
- Tool-use APIs (OpenAI function-calling, Anthropic tools) handle the structured-output side of “the model picks a tool and arguments.” Your code dispatches and returns the observation.
- Termination conditions matter: max iterations, success-by-the-model-saying-done, or a result-format match. Without them you loop forever.
- LangGraph, OpenAI Assistants API, Anthropic’s
messages.createwith tools — all are productionizations of the same loop. Build it once from scratch and the framework code becomes legible.
Mental model
Loop until the model emits a final answer or max iterations is reached.
A working ReAct agent in 80 lines
import json
from anthropic import Anthropic
client = Anthropic()
# Define tools as Python functions
def web_search(query: str) -> str:
"""Search the web. Returns top results as plain text."""
# In real impl, call an actual search API
return f"[Top 5 results for '{query}'...]"
def calculator(expression: str) -> str:
"""Evaluate a math expression. Returns result as a string."""
try: return str(eval(expression, {"__builtins__": None}, {}))
except Exception as e: return f"error: {e}"
def write_file(path: str, content: str) -> str:
"""Write content to a file. Returns confirmation."""
with open(path, 'w') as f: f.write(content)
return f"wrote {len(content)} bytes to {path}"
TOOLS = {
"web_search": (web_search, {
"name": "web_search",
"description": "Search the web and return top results.",
"input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
}),
"calculator": (calculator, {
"name": "calculator",
"description": "Evaluate a math expression.",
"input_schema": {"type": "object", "properties": {"expression": {"type": "string"}}, "required": ["expression"]},
}),
"write_file": (write_file, {
"name": "write_file",
"description": "Write content to a file at the given path.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]},
}),
}
def react_loop(user_query: str, max_iters: int = 10) -> str:
messages = [{"role": "user", "content": user_query}]
tools_spec = [t[1] for t in TOOLS.values()]
for i in range(max_iters):
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
tools=tools_spec,
messages=messages,
)
# Add assistant message to history
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
return "".join(b.text for b in response.content if b.type == "text")
# Execute tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use":
fn = TOOLS[block.name][0]
result = fn(**block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
if not tool_results:
return "[no tool calls and no end_turn — bailing]"
messages.append({"role": "user", "content": tool_results})
return "[max iterations exceeded]"
if __name__ == "__main__":
print(react_loop("What's 17 * 23 + sqrt(144)? Then save the result to /tmp/answer.txt."))That’s a complete agent. The model decides whether to call a tool or stop; we dispatch tools and pass results back; the loop terminates on end_turn or max iterations.
Tracing through an example run
Query: “What’s 17 × 23 + sqrt(144)? Save to /tmp/answer.txt.”
Iteration 1:
- Model emits: tool_use(
calculator,expression="17*23 + 12") - We execute: returns
"403".
Iteration 2:
- Model emits: tool_use(
write_file,path="/tmp/answer.txt",content="403") - We execute: returns
"wrote 3 bytes to /tmp/answer.txt".
Iteration 3:
- Model emits: text(
"Done — 17×23 + √144 = 403, saved to /tmp/answer.txt") stop_reason == "end_turn"→ return.
Three iterations, ~3 seconds, full agent loop. That’s it.
What ReAct buys you over plain prompting
The original 2022 paper showed that interleaving reasoning (“let me think…”) with actions (“let me search…”) outperforms pure-action approaches. The reason: the model’s text-output channel is where its “deliberation” happens. By letting the model emit reasoning between tool calls, it has a place to think — and that thinking is conditioned on prior observations.
In modern tool-use APIs, ReAct is implicit — the model produces both reasoning text and tool calls in the same response. The “let me think then act” structure is in the training distribution.
Termination conditions
Without explicit termination, agents loop forever:
# Always set max_iters
react_loop(query, max_iters=10)
# Sometimes add: stop if the last 3 actions are identical
def is_stuck(messages):
last_actions = [b for m in messages[-6:] if m["role"] == "assistant"
for b in m.get("content", []) if getattr(b, "type", None) == "tool_use"]
return len(last_actions) >= 3 and len({(a.name, str(a.input)) for a in last_actions}) == 1
# Or: stop if tool returned the same observation 3 timesIn production, typical max_iters is 20–50; the median real query takes 3–8 iterations.
Memory and context
The simple loop above keeps full message history. Token usage grows with iterations. For long-running agents:
- Truncate or summarize old history when the context approaches the model’s limit.
- Vector-store-backed memory: store observations in a vector DB; retrieve relevant ones on each iteration.
- Hierarchical agents: a planner agent that delegates subtasks to worker agents, each with their own bounded context.
LangGraph (Anthropic recommends it for production), OpenAI’s Assistants API, and Inngest’s Agent Framework all package these patterns.
MCP — the standardization layer
(Model Context Protocol — see MCP) is the open standard for exposing tools to LLMs. Instead of hand-coding tool definitions per provider, you implement an MCP server and any compliant client (Claude Desktop, Cursor, etc.) can use your tools. ReAct agents in 2026 increasingly consume MCP tool definitions rather than hand-rolled ones.
Common failure modes
- Hallucinated tool names: model invents a tool that doesn’t exist. fixes this.
- Argument errors: wrong types, missing fields. Strong JSON schemas + validation help.
- Loops: same action repeated forever. Add stuck-detection.
- Premature termination: model says “done” before the task is complete. Better tool descriptions + few-shot examples in the system prompt.
- Cost / time blow-up: long-context iterations. Set a hard ceiling on tokens or time per task.
Run it in your browser — toy ReAct loop
The shape — model → tool → observation → repeat → final — is the entire ReAct architecture. Real agents add many tools, memory, planners, but this loop is the heart.
Quick check
Key takeaways
- ReAct = reason + act loop. ~80 lines of Python; the foundation of every modern agent.
- Tool-use APIs handle the LLM-side structured output; your code dispatches functions and feeds back observations.
- Set termination conditions (max iters, stuck detection, success match). Without them you loop forever.
- Production frameworks (LangGraph, OpenAI Assistants, MCP) add memory, planning, multi-agent orchestration. The core loop is unchanged.
- Build it from scratch once. Every agent codebase becomes legible.
Go deeper
- PaperReAct: Synergizing Reasoning and Acting in Language ModelsThe original paper. Section 3 has the prompt format that became the standard.
- DocsAnthropic — Tool Use DocumentationHow Claude's tool-use API works. The most important reference for the messages-based ReAct loop.
- DocsOpenAI — Function CallingSame pattern, different surface. Read both to internalize the abstraction.
- DocsLangGraph DocumentationAnthropic-recommended production framework. The state-graph abstraction for multi-step agents.
- DocsModel Context ProtocolThe 2024+ standard for tool definitions. Increasingly the input format for ReAct agents.
- BlogAnthropic — Building Effective AgentsBest taxonomy of agent patterns from the team that ships Claude. Distinguishes "workflows" from "agents" usefully.
- Repoanthropics/anthropic-cookbookWorked examples including a ReAct loop with Claude's tool-use API.
TL;DR
- ReAct (Yao et al., 2022) = Reason + Act. The agent alternates between thinking (free-text reasoning), acting (calling a tool), and observing (reading the tool’s output). Same pattern, every modern LLM agent.
- The architecture is small: an LLM, a list of tools (functions), a loop. A working ReAct agent is ~80 lines of Python, no LangChain required.
- Tool-use APIs (OpenAI function-calling, Anthropic tools) handle the structured-output side of “the model picks a tool and arguments.” Your code dispatches and returns the observation.
- Termination conditions matter: max iterations, success-by-the-model-saying-done, or a result-format match. Without them you loop forever.
- LangGraph, OpenAI Assistants API, Anthropic’s
messages.createwith tools — all are productionizations of the same loop. Build it once from scratch and the framework code becomes legible.
Why this matters
Every “AI agent” — Cursor, Claude Code, Replit Agent, Devin, deep research, computer-use, the auto-coding tools that landed in 2024–2025 — runs a ReAct-style loop at its core. The framework on top adds memory, planning, sub-agents, recovery — but the loop is what makes it work. Until you’ve built ReAct from scratch, every agent codebase feels like magic; after, every agent codebase reads as “ah, that’s the loop, those are the tools.”
Mental model
Loop until the model emits a final answer or max iterations is reached.
Concrete walkthrough
A working ReAct agent in 80 lines
import json
from anthropic import Anthropic
client = Anthropic()
# Define tools as Python functions
def web_search(query: str) -> str:
"""Search the web. Returns top results as plain text."""
# In real impl, call an actual search API
return f"[Top 5 results for '{query}'...]"
def calculator(expression: str) -> str:
"""Evaluate a math expression. Returns result as a string."""
try: return str(eval(expression, {"__builtins__": None}, {}))
except Exception as e: return f"error: {e}"
def write_file(path: str, content: str) -> str:
"""Write content to a file. Returns confirmation."""
with open(path, 'w') as f: f.write(content)
return f"wrote {len(content)} bytes to {path}"
TOOLS = {
"web_search": (web_search, {
"name": "web_search",
"description": "Search the web and return top results.",
"input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
}),
"calculator": (calculator, {
"name": "calculator",
"description": "Evaluate a math expression.",
"input_schema": {"type": "object", "properties": {"expression": {"type": "string"}}, "required": ["expression"]},
}),
"write_file": (write_file, {
"name": "write_file",
"description": "Write content to a file at the given path.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]},
}),
}
def react_loop(user_query: str, max_iters: int = 10) -> str:
messages = [{"role": "user", "content": user_query}]
tools_spec = [t[1] for t in TOOLS.values()]
for i in range(max_iters):
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
tools=tools_spec,
messages=messages,
)
# Add assistant message to history
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
return "".join(b.text for b in response.content if b.type == "text")
# Execute tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use":
fn = TOOLS[block.name][0]
result = fn(**block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
if not tool_results:
return "[no tool calls and no end_turn — bailing]"
messages.append({"role": "user", "content": tool_results})
return "[max iterations exceeded]"
if __name__ == "__main__":
print(react_loop("What's 17 * 23 + sqrt(144)? Then save the result to /tmp/answer.txt."))That’s a complete agent. The model decides whether to call a tool or stop; we dispatch tools and pass results back; the loop terminates on end_turn or max iterations.
Tracing through an example run
Query: “What’s 17 × 23 + sqrt(144)? Save to /tmp/answer.txt.”
Iteration 1:
- Model emits: tool_use(
calculator,expression="17*23 + 12") - We execute: returns
"403".
Iteration 2:
- Model emits: tool_use(
write_file,path="/tmp/answer.txt",content="403") - We execute: returns
"wrote 3 bytes to /tmp/answer.txt".
Iteration 3:
- Model emits: text(
"Done — 17×23 + √144 = 403, saved to /tmp/answer.txt") stop_reason == "end_turn"→ return.
Three iterations, ~3 seconds, full agent loop. That’s it.
What ReAct buys you over plain prompting
The original 2022 paper showed that interleaving reasoning (“let me think…”) with actions (“let me search…”) outperforms pure-action approaches. The reason: the model’s text-output channel is where its “deliberation” happens. By letting the model emit reasoning between tool calls, it has a place to think — and that thinking is conditioned on prior observations.
In modern tool-use APIs, ReAct is implicit — the model produces both reasoning text and tool calls in the same response. The “let me think then act” structure is in the training distribution.
Termination conditions
Without explicit termination, agents loop forever:
# Always set max_iters
react_loop(query, max_iters=10)
# Sometimes add: stop if the last 3 actions are identical
def is_stuck(messages):
last_actions = [b for m in messages[-6:] if m["role"] == "assistant"
for b in m.get("content", []) if getattr(b, "type", None) == "tool_use"]
return len(last_actions) >= 3 and len({(a.name, str(a.input)) for a in last_actions}) == 1
# Or: stop if tool returned the same observation 3 timesIn production, typical max_iters is 20–50; the median real query takes 3–8 iterations.
Memory and context
The simple loop above keeps full message history. Token usage grows with iterations. For long-running agents:
- Truncate or summarize old history when the context approaches the model’s limit.
- Vector-store-backed memory: store observations in a vector DB; retrieve relevant ones on each iteration.
- Hierarchical agents: a planner agent that delegates subtasks to worker agents, each with their own bounded context.
LangGraph (Anthropic recommends it for production), OpenAI’s Assistants API, and Inngest’s Agent Framework all package these patterns.
MCP — the standardization layer
MCP (Model Context Protocol — see MCP) is the open standard for exposing tools to LLMs. Instead of hand-coding tool definitions per provider, you implement an MCP server and any compliant client (Claude Desktop, Cursor, etc.) can use your tools. ReAct agents in 2026 increasingly consume MCP tool definitions rather than hand-rolled ones.
Common failure modes
- Hallucinated tool names: model invents a tool that doesn’t exist. Constrained decoding fixes this.
- Argument errors: wrong types, missing fields. Strong JSON schemas + validation help.
- Loops: same action repeated forever. Add stuck-detection.
- Premature termination: model says “done” before the task is complete. Better tool descriptions + few-shot examples in the system prompt.
- Cost / time blow-up: long-context iterations. Set a hard ceiling on tokens or time per task.
Run it in your browser — toy ReAct loop
The shape — model → tool → observation → repeat → final — is the entire ReAct architecture. Real agents add many tools, memory, planners, but this loop is the heart.
Quick check
Key takeaways
- ReAct = reason + act loop. ~80 lines of Python; the foundation of every modern agent.
- Tool-use APIs handle the LLM-side structured output; your code dispatches functions and feeds back observations.
- Set termination conditions (max iters, stuck detection, success match). Without them you loop forever.
- Production frameworks (LangGraph, OpenAI Assistants, MCP) add memory, planning, multi-agent orchestration. The core loop is unchanged.
- Build it from scratch once. Every agent codebase becomes legible.
Go deeper
- PaperReAct: Synergizing Reasoning and Acting in Language ModelsThe original paper. Section 3 has the prompt format that became the standard.
- DocsAnthropic — Tool Use DocumentationHow Claude's tool-use API works. The most important reference for the messages-based ReAct loop.
- DocsOpenAI — Function CallingSame pattern, different surface. Read both to internalize the abstraction.
- DocsLangGraph DocumentationAnthropic-recommended production framework. The state-graph abstraction for multi-step agents.
- DocsModel Context ProtocolThe 2024+ standard for tool definitions. Increasingly the input format for ReAct agents.
- BlogAnthropic — Building Effective AgentsBest taxonomy of agent patterns from the team that ships Claude. Distinguishes "workflows" from "agents" usefully.
- Repoanthropics/anthropic-cookbookWorked examples including a ReAct loop with Claude's tool-use API.