From Zero to Agent: ReAct, Reflection, and Planning

September 6, 2025

Preface

We've covered a lot of topics in the past few posts, but one thing that is missing is the concept of agents. In this post, we'll cover the basics of agents and how they work.

What is an agent?

Let’s demystify what people mean by "AI agent" - it’s just an LLM that knows how to call tools, remember things, and reason in a loop. Everything else is upgrades.

Strip the hype away: an agent is not magic.
It’s an LLM wrapped in three essential components:

Reasoning → the ability to decide what to do next (or stop).
Tools → APIs, calculators, code, databases—ways to act on the world.
Memory → short-term (context window) and sometimes long-term (think database).

Components of an agent

Without those, an “agent” is just another plugin.
With those, you get a system that can observe → reason → act → repeat.

That’s the ReAct loop, and it’s where all agent stories begin.

Tier-1: ReAct Agent

The ReAct agent is a deceptively simple loop:

Observe → read the user request or the last tool result.
Reason → decide on the next action: a tool call or a final answer.
Act → call a tool (or provide a final answer).
Repeat until you can confidently give the Final Answer.

ReAct Loop

Source: ReAct: Synergizing Reasoning and Acting in Language Models Yao et al. (2022)

Mock ReAct Output:

👁️ Observe: What is the capital of France?
🧠 Reason: I need to call a tool to get the capital of France.
🔧 Act: Call get_capital(france).
👁️ Observe: get_capital(france) returned Paris.
🧠 Reason: I have enough information to provide a final answer.
✅ Final Answer: The capital of France is Paris.

Code example

# Simplified ReAct loop with tool calls and memory
for iteration in range(self.max_iterations):
    reasoning_result = await self.reason(messages)
    if reasoning_result.action_type == ActionType.TOOL_CALL:
        self.add_tool_calls_to_memory(messages, reasoning_result)
        for tool_call in reasoning_result.tool_calls:
            result = self.act(tool_call.function_name, tool_call.arguments)
            self.add_tool_result_to_memory(messages, tool_call.id, result)
        continue

    return reasoning_result.content

return "Maximum iterations reached. Unable to complete the task."

Tier-2 Upgrades: Reflection & Planning

Once you’ve got a ReAct loop working, you can add layers. We'll cover two of them: Reflection and Planning.

Reflection

If ReAct is the skeleton of an agent, Reflection is the muscle that makes it useful in practice. Instead of blindly chaining steps, a reflective agent pauses after each action to ask:

“Did this work?”
“If not, what should I change before trying again?”

So, reflection isn’t just retrying — it’s retrying with feedback. The agent adapts its next attempt based on explicit error signals. This feedback loop reduces repeated mistakes, surfaces inconsistencies, and acts as a lightweight validator before the final answer.

In practice, reflection often pairs with some form of checking: sometimes it’s deterministic (tests, schema validators), other times it’s softer (LLM-as-judge, heuristics). Either way, the point is the same: don’t trust a single shot of reasoning — make the agent check itself and adapt.

📝 How It Shows Up

Code agents: run unit tests, linters, or compilers. Failures are clear signals → reflection turns them into fixes (“add null check,” “import missing module”).
Data agents: pass output through JSON schema validators. If invalid, reflection rewrites to fit the schema.
Knowledge agents: cross-check answers against retrieved passages. Reflection compares reasoning vs. sources (“my answer contradicts doc X → re-evaluate”).
General copilots: use an LLM-as-judge to critique quality, style, or factual grounding, then refine.

🔧 In the Wild

Manus: Reported to use reflection + verification loops to catch mistakes before deploying.
GitHub Copilot: integrates compilation/test signals; reflections appear as suggested fixes when tests fail.

📚 Research Roots

Reflexion: Language Agents with Verbal Reinforcement Learning Shinn et al. (2023)
SELF-REFINE: Iterative Refinement with Self-Feedback Madaan et al. (2023)
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing Gou et al. (2024)

⚡ Why It Matters
In summary, verification signals (tests, validators, retrieval cross-checks, or even LLM-as-judge) give the pass/fail, but reflection turns those signals into guidelines for the next try. That’s why Reflection is arguably the most important addition to the ReAct loop. Without it, agents remain brittle demos; with it, they become systems you can actually ship. Every production-grade agent you’ve heard of — Copilot, Manus, Salesforce’s Agentforce — relies on reflection. Without it, agents break on the first edge case.

Planning

If Reflection helps an agent avoid repeating mistakes, Planning helps it avoid short-sightedness from the start.

Instead of reasoning one step at a time, a planning agent sketches a roadmap of sub-tasks up front. It then executes them in order, updating only when observations contradict the plan.

📝 How It Shows Up

Code agents: outline a multi-step plan before editing files (e.g. “1. add function stub → 2. write tests → 3. implement logic → 4. refactor”).
Data agents: plan pipelines of transformations (e.g. “clean → normalize → validate → load”).
Knowledge agents: plan retrieval + reasoning workflows (e.g. “search papers → extract methods → compare results → write summary”).

🔧 In the Wild

Claude Code: generates internal planning traces before producing code.
Claude Task Master: explicit task decomposition + execution.
LangChain’s Plan-and-Execute: separates planning and acting into two distinct phases.
Salesforce Agentforce: Capable of dynamic multi-step action planning within workflows.

📚 Research Roots

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — Wei et al. (2022).
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face — Shen et al. (2023).
Understanding the planning of LLM agents: A survey - Huang et al. (2024)

⚡ Why It Matters

Rooted in reasoning: CoT proved that structured intermediate steps improve outcomes — planning is just making those steps explicit, inspectable, and reusable.
Reduces shortsightedness: agents don’t stumble step by step, they see the whole path.
Transparency: you can debug a plan before execution.

Tier-3 Upgrades: Advanced Loops

Branching

When a single trajectory isn’t enough, agents can branch into multiple futures, simulate outcomes, and prune weak paths. A promising relatively new technique is called Language Agent Tree Search (LATS).

Idea: MCTS-style loop — select → expand → simulate → backprop — unifies planning, acting, and reasoning.
Use in the wild: Big gains on coding (HumanEval), QA (HotPotQA), and navigation (WebShop). Implement as a search loop: propose multiple actions, run rollouts, prune by value.
A practical demonstration is available in LangChain on this topic.

Notable Papers

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models - Zhou et al. (2023)

Consistency

Not agentic, but useful for robustness: sample many reasoning chains, then pick the most common final answer (majority vote) or prune via a critic.

Idea: Multiple independent paths → aggregation → more reliable final answer.
Use in the wild: Easy to implement — just run N parallel generations and vote or filter. Works well for math, logic, and QA.

Notable Papers

Self-Consistency Improves Chain of Thought Reasoning in Language Models - Wang et al. (2022)
Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling - Wan et al. (2024)

These are still emerging techniques, with early research showing promising results but limited production adoption.

Takeaways

An agent is an LLM wrapped in three essential components: Reasoning, Tools, and Memory.
The ReAct loop is the foundation of all agents.
Reflection and Planning are the two most important upgrades to the ReAct loop.
Branching and Consistency are two other ways to make agents more robust and capable.

What's next

Now that we’ve built up single-agent loops, the natural next step is Multi-Agent Systems (MAS) — agents that coordinate, communicate, and specialize. This is where things get messy, but also where the frontier is moving: from single reasoning loops to societies of agents tackling complex tasks together. That’s what we’ll dive into next.

👉 Full code for the agent is available in my GitHub repo