Skip to content

Build a ReAct Agent

A agent is a while loop with three lines in the body: ask the model what to do next, run the tool it picked, feed the result back. Wrap that in 80 lines of Python and you have an agent that can search the web, run code, and edit files. Wrap it in LangGraph and you have a state machine with checkpoints and replay. Wrap it in Claude Desktop or Cursor and you have a product. The loop is the same.

is the primitive — the model emits a typed tool_use block, your code dispatches it to a function. ReAct is the loop that strings tool calls together with reasoning text in between, so the model has somewhere to think between actions. Yao et al. showed in 2022 that interleaved thought + action beats pure-action prompting on every multi-step benchmark; every modern agent — Cursor, Claude Code, Replit Agent, Devin, deep research — descends from that paper. Build the loop from scratch once and every agent codebase becomes legible: ah, that’s the loop, those are the tools. This lesson is that build, plus the failure modes that bite production stacks.

TL;DR

  • ReAct (Yao et al., 2022) = Reason + Act. The agent alternates between thinking (free-text reasoning), acting (calling a tool), and observing (reading the tool’s output). Same pattern, every modern LLM agent.
  • The architecture is small: an LLM, a list of tools (functions), a loop. A working ReAct agent is ~80 lines of Python, no LangChain required.
  • Tool-use APIs (OpenAI function-calling, Anthropic tools) handle the structured-output side of “the model picks a tool and arguments.” Your code dispatches and returns the observation.
  • Termination conditions matter: max iterations, success-by-the-model-saying-done, or a result-format match. Without them you loop forever.
  • LangGraph, OpenAI Assistants API, Anthropic’s messages.create with tools — all are productionizations of the same loop. Build it once from scratch and the framework code becomes legible.

Mental model

Loop until the model emits a final answer or max iterations is reached.

A working ReAct agent in 80 lines

import json from anthropic import Anthropic client = Anthropic() # Define tools as Python functions def web_search(query: str) -> str: """Search the web. Returns top results as plain text.""" # In real impl, call an actual search API return f"[Top 5 results for '{query}'...]" def calculator(expression: str) -> str: """Evaluate a math expression. Returns result as a string.""" try: return str(eval(expression, {"__builtins__": None}, {})) except Exception as e: return f"error: {e}" def write_file(path: str, content: str) -> str: """Write content to a file. Returns confirmation.""" with open(path, 'w') as f: f.write(content) return f"wrote {len(content)} bytes to {path}" TOOLS = { "web_search": (web_search, { "name": "web_search", "description": "Search the web and return top results.", "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}, }), "calculator": (calculator, { "name": "calculator", "description": "Evaluate a math expression.", "input_schema": {"type": "object", "properties": {"expression": {"type": "string"}}, "required": ["expression"]}, }), "write_file": (write_file, { "name": "write_file", "description": "Write content to a file at the given path.", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}, }), } def react_loop(user_query: str, max_iters: int = 10) -> str: messages = [{"role": "user", "content": user_query}] tools_spec = [t[1] for t in TOOLS.values()] for i in range(max_iters): response = client.messages.create( model="claude-sonnet-4-5", max_tokens=2048, tools=tools_spec, messages=messages, ) # Add assistant message to history messages.append({"role": "assistant", "content": response.content}) if response.stop_reason == "end_turn": return "".join(b.text for b in response.content if b.type == "text") # Execute tool calls tool_results = [] for block in response.content: if block.type == "tool_use": fn = TOOLS[block.name][0] result = fn(**block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": result, }) if not tool_results: return "[no tool calls and no end_turn — bailing]" messages.append({"role": "user", "content": tool_results}) return "[max iterations exceeded]" if __name__ == "__main__": print(react_loop("What's 17 * 23 + sqrt(144)? Then save the result to /tmp/answer.txt."))

That’s a complete agent. The model decides whether to call a tool or stop; we dispatch tools and pass results back; the loop terminates on end_turn or max iterations.

Tracing through an example run

Query: “What’s 17 × 23 + sqrt(144)? Save to /tmp/answer.txt.”

Iteration 1:

  • Model emits: tool_use(calculator, expression="17*23 + 12")
  • We execute: returns "403".

Iteration 2:

  • Model emits: tool_use(write_file, path="/tmp/answer.txt", content="403")
  • We execute: returns "wrote 3 bytes to /tmp/answer.txt".

Iteration 3:

  • Model emits: text("Done — 17×23 + √144 = 403, saved to /tmp/answer.txt")
  • stop_reason == "end_turn" → return.

Three iterations, ~3 seconds, full agent loop. That’s it.

What ReAct buys you over plain prompting

The original 2022 paper showed that interleaving reasoning (“let me think…”) with actions (“let me search…”) outperforms pure-action approaches. The reason: the model’s text-output channel is where its “deliberation” happens. By letting the model emit reasoning between tool calls, it has a place to think — and that thinking is conditioned on prior observations.

In modern tool-use APIs, ReAct is implicit — the model produces both reasoning text and tool calls in the same response. The “let me think then act” structure is in the training distribution.

Termination conditions

Without explicit termination, agents loop forever:

# Always set max_iters react_loop(query, max_iters=10) # Sometimes add: stop if the last 3 actions are identical def is_stuck(messages): last_actions = [b for m in messages[-6:] if m["role"] == "assistant" for b in m.get("content", []) if getattr(b, "type", None) == "tool_use"] return len(last_actions) >= 3 and len({(a.name, str(a.input)) for a in last_actions}) == 1 # Or: stop if tool returned the same observation 3 times

In production, typical max_iters is 20–50; the median real query takes 3–8 iterations.

Memory and context

The simple loop above keeps full message history. Token usage grows with iterations. For long-running agents:

  • Truncate or summarize old history when the context approaches the model’s limit.
  • Vector-store-backed memory: store observations in a vector DB; retrieve relevant ones on each iteration.
  • Hierarchical agents: a planner agent that delegates subtasks to worker agents, each with their own bounded context.

LangGraph (Anthropic recommends it for production), OpenAI’s Assistants API, and Inngest’s Agent Framework all package these patterns.

MCP — the standardization layer

(Model Context Protocol — see MCP) is the open standard for exposing tools to LLMs. Instead of hand-coding tool definitions per provider, you implement an MCP server and any compliant client (Claude Desktop, Cursor, etc.) can use your tools. ReAct agents in 2026 increasingly consume MCP tool definitions rather than hand-rolled ones.

Common failure modes

  • Hallucinated tool names: model invents a tool that doesn’t exist. fixes this.
  • Argument errors: wrong types, missing fields. Strong JSON schemas + validation help.
  • Loops: same action repeated forever. Add stuck-detection.
  • Premature termination: model says “done” before the task is complete. Better tool descriptions + few-shot examples in the system prompt.
  • Cost / time blow-up: long-context iterations. Set a hard ceiling on tokens or time per task.

Run it in your browser — toy ReAct loop

Python — editableSimulated ReAct loop using a hardcoded fake LLM that 'reasons' through a small task.
Ctrl+Enter to run

The shape — model → tool → observation → repeat → final — is the entire ReAct architecture. Real agents add many tools, memory, planners, but this loop is the heart.

Quick check

Fill in the blank
The 2022 paper that introduced the ReAct (reason + act) framework for LLM agents:
Lead author's surname.
Quick check
A team's ReAct agent occasionally loops: it calls the same search tool with the same query 5 times in a row. Best fix:

Key takeaways

  1. ReAct = reason + act loop. ~80 lines of Python; the foundation of every modern agent.
  2. Tool-use APIs handle the LLM-side structured output; your code dispatches functions and feeds back observations.
  3. Set termination conditions (max iters, stuck detection, success match). Without them you loop forever.
  4. Production frameworks (LangGraph, OpenAI Assistants, MCP) add memory, planning, multi-agent orchestration. The core loop is unchanged.
  5. Build it from scratch once. Every agent codebase becomes legible.

Go deeper

TL;DR

  • ReAct (Yao et al., 2022) = Reason + Act. The agent alternates between thinking (free-text reasoning), acting (calling a tool), and observing (reading the tool’s output). Same pattern, every modern LLM agent.
  • The architecture is small: an LLM, a list of tools (functions), a loop. A working ReAct agent is ~80 lines of Python, no LangChain required.
  • Tool-use APIs (OpenAI function-calling, Anthropic tools) handle the structured-output side of “the model picks a tool and arguments.” Your code dispatches and returns the observation.
  • Termination conditions matter: max iterations, success-by-the-model-saying-done, or a result-format match. Without them you loop forever.
  • LangGraph, OpenAI Assistants API, Anthropic’s messages.create with tools — all are productionizations of the same loop. Build it once from scratch and the framework code becomes legible.

Why this matters

Every “AI agent” — Cursor, Claude Code, Replit Agent, Devin, deep research, computer-use, the auto-coding tools that landed in 2024–2025 — runs a ReAct-style loop at its core. The framework on top adds memory, planning, sub-agents, recovery — but the loop is what makes it work. Until you’ve built ReAct from scratch, every agent codebase feels like magic; after, every agent codebase reads as “ah, that’s the loop, those are the tools.”

Mental model

Loop until the model emits a final answer or max iterations is reached.

Concrete walkthrough

A working ReAct agent in 80 lines

import json from anthropic import Anthropic client = Anthropic() # Define tools as Python functions def web_search(query: str) -> str: """Search the web. Returns top results as plain text.""" # In real impl, call an actual search API return f"[Top 5 results for '{query}'...]" def calculator(expression: str) -> str: """Evaluate a math expression. Returns result as a string.""" try: return str(eval(expression, {"__builtins__": None}, {})) except Exception as e: return f"error: {e}" def write_file(path: str, content: str) -> str: """Write content to a file. Returns confirmation.""" with open(path, 'w') as f: f.write(content) return f"wrote {len(content)} bytes to {path}" TOOLS = { "web_search": (web_search, { "name": "web_search", "description": "Search the web and return top results.", "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}, }), "calculator": (calculator, { "name": "calculator", "description": "Evaluate a math expression.", "input_schema": {"type": "object", "properties": {"expression": {"type": "string"}}, "required": ["expression"]}, }), "write_file": (write_file, { "name": "write_file", "description": "Write content to a file at the given path.", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}, }), } def react_loop(user_query: str, max_iters: int = 10) -> str: messages = [{"role": "user", "content": user_query}] tools_spec = [t[1] for t in TOOLS.values()] for i in range(max_iters): response = client.messages.create( model="claude-sonnet-4-5", max_tokens=2048, tools=tools_spec, messages=messages, ) # Add assistant message to history messages.append({"role": "assistant", "content": response.content}) if response.stop_reason == "end_turn": return "".join(b.text for b in response.content if b.type == "text") # Execute tool calls tool_results = [] for block in response.content: if block.type == "tool_use": fn = TOOLS[block.name][0] result = fn(**block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": result, }) if not tool_results: return "[no tool calls and no end_turn — bailing]" messages.append({"role": "user", "content": tool_results}) return "[max iterations exceeded]" if __name__ == "__main__": print(react_loop("What's 17 * 23 + sqrt(144)? Then save the result to /tmp/answer.txt."))

That’s a complete agent. The model decides whether to call a tool or stop; we dispatch tools and pass results back; the loop terminates on end_turn or max iterations.

Tracing through an example run

Query: “What’s 17 × 23 + sqrt(144)? Save to /tmp/answer.txt.”

Iteration 1:

  • Model emits: tool_use(calculator, expression="17*23 + 12")
  • We execute: returns "403".

Iteration 2:

  • Model emits: tool_use(write_file, path="/tmp/answer.txt", content="403")
  • We execute: returns "wrote 3 bytes to /tmp/answer.txt".

Iteration 3:

  • Model emits: text("Done — 17×23 + √144 = 403, saved to /tmp/answer.txt")
  • stop_reason == "end_turn" → return.

Three iterations, ~3 seconds, full agent loop. That’s it.

What ReAct buys you over plain prompting

The original 2022 paper showed that interleaving reasoning (“let me think…”) with actions (“let me search…”) outperforms pure-action approaches. The reason: the model’s text-output channel is where its “deliberation” happens. By letting the model emit reasoning between tool calls, it has a place to think — and that thinking is conditioned on prior observations.

In modern tool-use APIs, ReAct is implicit — the model produces both reasoning text and tool calls in the same response. The “let me think then act” structure is in the training distribution.

Termination conditions

Without explicit termination, agents loop forever:

# Always set max_iters react_loop(query, max_iters=10) # Sometimes add: stop if the last 3 actions are identical def is_stuck(messages): last_actions = [b for m in messages[-6:] if m["role"] == "assistant" for b in m.get("content", []) if getattr(b, "type", None) == "tool_use"] return len(last_actions) >= 3 and len({(a.name, str(a.input)) for a in last_actions}) == 1 # Or: stop if tool returned the same observation 3 times

In production, typical max_iters is 20–50; the median real query takes 3–8 iterations.

Memory and context

The simple loop above keeps full message history. Token usage grows with iterations. For long-running agents:

  • Truncate or summarize old history when the context approaches the model’s limit.
  • Vector-store-backed memory: store observations in a vector DB; retrieve relevant ones on each iteration.
  • Hierarchical agents: a planner agent that delegates subtasks to worker agents, each with their own bounded context.

LangGraph (Anthropic recommends it for production), OpenAI’s Assistants API, and Inngest’s Agent Framework all package these patterns.

MCP — the standardization layer

MCP (Model Context Protocol — see MCP) is the open standard for exposing tools to LLMs. Instead of hand-coding tool definitions per provider, you implement an MCP server and any compliant client (Claude Desktop, Cursor, etc.) can use your tools. ReAct agents in 2026 increasingly consume MCP tool definitions rather than hand-rolled ones.

Common failure modes

  • Hallucinated tool names: model invents a tool that doesn’t exist. Constrained decoding fixes this.
  • Argument errors: wrong types, missing fields. Strong JSON schemas + validation help.
  • Loops: same action repeated forever. Add stuck-detection.
  • Premature termination: model says “done” before the task is complete. Better tool descriptions + few-shot examples in the system prompt.
  • Cost / time blow-up: long-context iterations. Set a hard ceiling on tokens or time per task.

Run it in your browser — toy ReAct loop

Python — editableSimulated ReAct loop using a hardcoded fake LLM that 'reasons' through a small task.
Ctrl+Enter to run

The shape — model → tool → observation → repeat → final — is the entire ReAct architecture. Real agents add many tools, memory, planners, but this loop is the heart.

Quick check

Fill in the blank
The 2022 paper that introduced the ReAct (reason + act) framework for LLM agents:
Lead author's surname.
Quick check
A team's ReAct agent occasionally loops: it calls the same search tool with the same query 5 times in a row. Best fix:

Key takeaways

  1. ReAct = reason + act loop. ~80 lines of Python; the foundation of every modern agent.
  2. Tool-use APIs handle the LLM-side structured output; your code dispatches functions and feeds back observations.
  3. Set termination conditions (max iters, stuck detection, success match). Without them you loop forever.
  4. Production frameworks (LangGraph, OpenAI Assistants, MCP) add memory, planning, multi-agent orchestration. The core loop is unchanged.
  5. Build it from scratch once. Every agent codebase becomes legible.

Go deeper