Advanced Prompt Engineering: Chain-of-Thought, ReAct, and Tree-of-Thoughts Explained

Master modern prompting techniques that unlock reasoning, planning, and multi-step problem-solving in large language models

Posted by Perivitta on March 21, 2026 · 25 mins read
Understanding : A Step-by-Step Guide

Advanced Prompt Engineering: Chain-of-Thought, ReAct, and Tree-of-Thoughts Explained


Introduction

The way you prompt a language model fundamentally shapes its output quality. A well-crafted prompt can turn a mediocre response into an excellent one.

Early LLM applications relied on simple prompts: "Summarize this text" or "Answer this question." These work for basic tasks but fail when problems require reasoning, planning, or multi-step thinking.

Modern prompt engineering has evolved far beyond simple instructions. Techniques like Chain-of-Thought (CoT), ReAct, and Tree-of-Thoughts enable LLMs to tackle complex problems by breaking them into manageable steps, combining reasoning with action, and exploring multiple solution paths.

This post explains advanced prompting techniques that have become essential for production LLM applications. You will learn how these methods work, when to use them, and how to implement them effectively.


The Foundation: Zero-Shot vs Few-Shot Prompting

Before diving into advanced techniques, we need to understand the baseline approaches.

Zero-Shot Prompting

Zero-shot prompting means asking the model to perform a task without providing any examples.

Example:

Classify the sentiment of this review as positive, negative, or neutral:
"The food was okay but the service was terrible."
 
Sentiment:

The model responds based solely on its pre-training. This works well for common tasks but struggles with domain-specific or nuanced problems.

Few-Shot Prompting

Few-shot prompting provides a few examples before the actual task.

Example:

Classify the sentiment as positive, negative, or neutral:
 
Review: "Amazing experience! Highly recommend."
Sentiment: Positive
 
Review: "Waste of money. Never coming back."
Sentiment: Negative
 
Review: "It was fine, nothing special."
Sentiment: Neutral
 
Review: "The food was okay but the service was terrible."
Sentiment:

The model learns the task format and expected outputs from examples. This significantly improves performance, especially for custom tasks.

When to Use Each

  • Zero-shot: Common tasks, simple queries, when latency and cost matter.
  • Few-shot: Domain-specific tasks, custom formats, when accuracy matters more than speed.

Chain-of-Thought (CoT) Prompting

Chain-of-Thought prompting is a breakthrough technique that dramatically improves reasoning by asking the model to think step-by-step.

The Core Idea

Instead of jumping directly to the answer, the model explicitly generates intermediate reasoning steps. This mirrors how humans solve complex problems: breaking them down, working through logic, and then concluding.

Example: Without CoT

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 balls. How many tennis balls does he have now?
 
A: 11

The model might get this right through pattern matching, but it does not show its work.

Example: With CoT

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 balls. How many tennis balls does he have now?
 
Let's think step by step:
1. Roger starts with 5 tennis balls.
2. He buys 2 cans of tennis balls.
3. Each can contains 3 balls, so 2 cans contain 2 Γ— 3 = 6 balls.
4. Total balls = initial balls + new balls = 5 + 6 = 11 balls.
 
A: Roger now has 11 tennis balls.

By generating intermediate steps, the model is forced to engage its reasoning capabilities rather than relying on memorized patterns.

Few-Shot Chain-of-Thought

The most effective CoT approach provides examples with reasoning chains.

Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are there total?
A: Let's think step by step. There are initially 3 cars. 2 more cars arrive. 3 + 2 = 5. The answer is 5.
 
Q: John has 4 apples. He gives 2 to his friend. How many apples does John have left?
A: Let's think step by step. John starts with 4 apples. He gives away 2 apples. 4 - 2 = 2. The answer is 2.
 
Q: A baker made 15 cupcakes. She sold 8 of them. Then she made 6 more. How many cupcakes does she have now?
A: Let's think step by step.

The model learns to emulate the step-by-step reasoning pattern from the examples.

Zero-Shot Chain-of-Thought

Remarkably, simply adding "Let's think step by step" triggers reasoning without any examples.

Q: A farmer has 12 chickens and 8 cows. Each chicken has 2 legs and each cow has 4 legs. How many total legs are there?
 
Let's think step by step:

This phrase acts as a trigger that activates the model's chain-of-thought reasoning mode.

When CoT Helps Most

  • Multi-step arithmetic problems.
  • Logical reasoning tasks.
  • Complex question answering requiring inference.
  • Code debugging and generation.
  • Planning and scheduling problems.

Self-Consistency: Improving CoT Reliability

A single reasoning chain can make mistakes. Self-consistency addresses this by generating multiple reasoning paths and voting on the final answer.

How It Works

  1. Generate multiple CoT reasoning chains for the same problem (using temperature > 0 for diversity).
  2. Extract the final answer from each chain.
  3. Select the most common answer (majority vote).

Implementation

import openai
from collections import Counter
 
def self_consistency(question: str, num_samples=5) -> str:
    """Generate multiple CoT reasoning chains and vote on answer"""
    
    prompt = f"{question}\n\nLet's think step by step:"
    
    answers = []
    for _ in range(num_samples):
        response = openai.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7  # Higher temperature for diversity
        )
        
        # Extract final answer from response
        text = response.choices[0].message.content
        answer = extract_final_answer(text)
        answers.append(answer)
    
    # Majority vote
    most_common = Counter(answers).most_common(1)[0][0]
    return most_common
 
def extract_final_answer(text: str) -> str:
    """Extract the final numerical or categorical answer"""
    # Implementation depends on answer format
    # Could use regex, parsing, or another LLM call
    pass

Tradeoffs

  • Pros: Significantly higher accuracy on reasoning tasks.
  • Cons: 5-10Γ— more expensive and slower due to multiple LLM calls.

Use self-consistency for critical decisions where accuracy matters more than cost.


ReAct: Reasoning and Acting

ReAct (Reasoning + Acting) combines chain-of-thought reasoning with the ability to take actions (tool calls, database queries, API calls).

The model alternates between reasoning about what to do next and actually doing it.

The ReAct Pattern

Each step follows a Thought β†’ Action β†’ Observation cycle:

  • Thought: The model reasons about what information it needs.
  • Action: The model calls a tool or function.
  • Observation: The result of the action is returned.

This continues until the model has enough information to answer.

Example: Question Answering with Search

Question: What is the population of the capital of France?
 
Thought 1: I need to know the capital of France first.
Action 1: Search["capital of France"]
Observation 1: Paris is the capital of France.
 
Thought 2: Now I need the population of Paris.
Action 2: Search["population of Paris"]
Observation 2: The population of Paris is approximately 2.2 million.
 
Thought 3: I now have the answer.
Answer: The population of the capital of France (Paris) is approximately 2.2 million.

ReAct Prompt Template

You can use the following tools:
- Search[query]: Search for information
- Calculator[expression]: Perform calculations
- Finish[answer]: End with final answer
 
Use this format:
Thought: [your reasoning]
Action: [tool to use]
Observation: [result will be provided]
... (repeat as needed)
Thought: I now know the final answer
Finish: [final answer]
 
Question: {user_question}
 
Let's begin:

Implementation with Function Calling

def react_agent(question: str, tools: dict, max_steps=10):
    """ReAct agent implementation"""
    
    conversation = [
        {"role": "system", "content": build_react_prompt(tools)},
        {"role": "user", "content": question}
    ]
    
    for step in range(max_steps):
        response = openai.chat.completions.create(
            model="gpt-4",
            messages=conversation,
            tools=format_tools_for_api(tools),
            tool_choice="auto"
        )
        
        message = response.choices[0].message
        
        # Check if done
        if message.content and "Finish:" in message.content:
            return extract_final_answer(message.content)
        
        # Execute tool call
        if message.tool_calls:
            tool_call = message.tool_calls[0]
            tool_name = tool_call.function.name
            tool_args = json.loads(tool_call.function.arguments)
            
            # Execute the tool
            observation = tools[tool_name](**tool_args)
            
            # Add to conversation
            conversation.append(message)
            conversation.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": str(observation)
            })
        else:
            break
    
    return "Failed to reach conclusion"

When to Use ReAct

  • Questions requiring external information.
  • Multi-step tasks involving tools (search, database, APIs).
  • Scenarios where the model needs to gather information dynamically.
  • Complex workflows that benefit from explicit reasoning traces.

Tree-of-Thoughts: Exploring Multiple Paths

Tree-of-Thoughts (ToT) extends chain-of-thought by exploring multiple reasoning paths in parallel, evaluating them, and selecting the best ones.

The Core Concept

Instead of following a single linear reasoning chain, ToT maintains a tree of possible reasoning steps. At each node, the model generates multiple next-step options, evaluates them, and expands the most promising ones.

This is similar to how chess engines explore move trees using search algorithms.

ToT Process

  1. Thought generation: Generate multiple possible next thoughts.
  2. Evaluation: Score each thought for promise/correctness.
  3. Search strategy: Decide which thoughts to expand (BFS, DFS, or beam search).
  4. Backtracking: If a path fails, explore alternatives.

Example: Creative Writing Task

Task: Write a coherent story with exactly 4 sentences about a detective solving a mystery.
 
Step 1: Generate 3 possible first sentences
Option A: "Detective Miller arrived at the crime scene on a rainy Tuesday morning."
Option B: "The old mansion had been empty for years until tonight."
Option C: "Everyone knew the butler did it, except Detective Sarah."
 
Step 2: Evaluate each option
Evaluation: Option A provides good setup, B creates intrigue, C subverts expectations.
Selected: Option A (most conventional, easier to build on)
 
Step 3: Generate next sentences based on Option A
... (continue expanding the tree)

Implementation Sketch

def tree_of_thoughts(problem: str, depth=3, breadth=3):
    """
    Simplified Tree-of-Thoughts implementation
    """
    
    def generate_thoughts(current_state: str, num=3) -> list:
        """Generate possible next reasoning steps"""
        prompt = f"Given this partial solution:\n{current_state}\n\nGenerate {num} different possible next steps. Output as numbered list."
        response = call_llm(prompt)
        return parse_numbered_list(response)
    
    def evaluate_thought(thought: str) -> float:
        """Score a thought's promise (0-1)"""
        prompt = f"Rate this reasoning step for correctness and promise (0-10):\n{thought}"
        response = call_llm(prompt)
        return extract_score(response) / 10
    
    # Initialize with problem
    current_best = problem
    
    for level in range(depth):
        # Generate multiple thoughts
        candidates = generate_thoughts(current_best, breadth)
        
        # Evaluate each
        scored = [(t, evaluate_thought(t)) for t in candidates]
        
        # Select best
        best_thought = max(scored, key=lambda x: x[1])[0]
        
        # Expand
        current_best += "\n" + best_thought
    
    return current_best

When to Use ToT

  • Problems with multiple valid solution paths.
  • Creative tasks (writing, design, brainstorming).
  • Puzzles and games requiring search.
  • Tasks where a single wrong step leads to failure.

Tradeoffs

  • Pros: Explores solution space thoroughly, finds better solutions.
  • Cons: Very expensive (exponential token usage), high latency.

ToT is best reserved for high-value problems where solution quality matters more than cost.


Comparison: Prompting Techniques

Technique Reasoning Type Cost Latency Best For
Zero-Shot Direct answer Low Fast Simple queries
Few-Shot Pattern learning Low-Medium Fast Custom formats
Chain-of-Thought Step-by-step Medium Medium Reasoning tasks
Self-Consistency Multiple chains + vote High Slow Critical decisions
ReAct Reasoning + action Medium-High Slow Multi-step with tools
Tree-of-Thoughts Explore multiple paths Very High Very Slow Complex problems

Prompt Optimization Techniques

1. Role Assignment

Explicitly tell the model what role to play.

You are an expert data scientist with 15 years of experience.
Analyze the following dataset and provide insights.

This primes the model to adopt domain expertise.

2. Output Format Specification

Be explicit about desired output structure.

Provide your answer in this format:
1. Summary (2-3 sentences)
2. Key Findings (bullet points)
3. Recommendations (numbered list)

3. Constraint Setting

Specify limitations and requirements clearly.

Requirements:
- Answer must be under 100 words
- Use simple language (8th grade reading level)
- Include at least one concrete example
- Avoid jargon

4. Negative Examples

Show what NOT to do.

Good answer: "The project will cost approximately $50,000 and take 3 months."
 
Bad answer: "It depends on many factors and could cost anything."
 
Now answer this question: How long will the migration take?

5. Iterative Refinement

Use multi-turn conversations to refine outputs.

Turn 1: Generate initial answer
Turn 2: "Now make it more concise"
Turn 3: "Add a concrete example"
Turn 4: "Verify the math is correct"

Production Best Practices

1. Version Your Prompts

# prompt_templates.py
 
PROMPTS = {
    "customer_query_v1": "Answer customer questions politely...",
    "customer_query_v2": "You are a helpful customer service agent...",
    "customer_query_v3": "...",  # Latest version
}
 
def get_prompt(name: str, version: str = "latest"):
    if version == "latest":
        version = max([k for k in PROMPTS if k.startswith(name)], 
                     key=lambda x: int(x.split("_v")[1]))
    return PROMPTS[version]

2. A/B Test Prompts

import random
 
def route_to_prompt(user_id: str):
    """A/B test different prompt versions"""
    variant = hash(user_id) % 2
    
    if variant == 0:
        return PROMPTS["customer_query_v2"]
    else:
        return PROMPTS["customer_query_v3"]

3. Log and Monitor Performance

def track_prompt_performance(prompt_version: str, response: str, user_feedback: float):
    """Track which prompts perform best"""
    log_to_database({
        "prompt_version": prompt_version,
        "response_length": len(response),
        "user_satisfaction": user_feedback,
        "timestamp": datetime.now()
    })

4. Handle Edge Cases

Explicitly address common failure modes.

Important rules:
- If you don't know the answer, say "I don't have enough information"
- If the question is ambiguous, ask for clarification
- If the request violates policy, politely decline
- Never make up facts or statistics

Prompt Engineering Tools

LangChain Prompt Templates

from langchain.prompts import PromptTemplate
 
template = """
You are a {role}.
Task: {task}
Context: {context}
 
Output format: {format}
"""
 
prompt = PromptTemplate(
    input_variables=["role", "task", "context", "format"],
    template=template
)
 
final_prompt = prompt.format(
    role="senior software engineer",
    task="Review this code for bugs",
    context=code_snippet,
    format="Bullet points with severity ratings"
)

Prompt Optimization with DSPy

import dspy
 
# Define signature
class QuestionAnswering(dspy.Signature):
    """Answer questions based on context"""
    context = dspy.InputField()
    question = dspy.InputField()
    answer = dspy.OutputField()
 
# Create module
qa = dspy.ChainOfThought(QuestionAnswering)
 
# Optimize prompts automatically
optimizer = dspy.BootstrapFewShot(metric=answer_correctness)
optimized_qa = optimizer.compile(qa, trainset=examples)

Common Pitfalls

1. Over-Engineering Prompts

Start simple. Add complexity only when needed.

2. Ignoring Context Window Limits

Few-shot examples and CoT reasoning consume tokens quickly. Monitor usage.

3. Not Testing Edge Cases

Test your prompts with unusual inputs, empty strings, very long texts, and adversarial examples.

4. Assuming Determinism

Even at temperature 0, outputs can vary slightly. Design for variability.


Future of Prompt Engineering

Emerging trends:

  • Automatic prompt optimization: AI systems that write better prompts than humans.
  • Multimodal prompting: Combining text, images, and other modalities.
  • Prompt compression: Achieving the same quality with fewer tokens.
  • Meta-prompting: LLMs that generate prompts for other LLMs.

Conclusion

Prompt engineering is both art and science. While advanced techniques like Chain-of-Thought, ReAct, and Tree-of-Thoughts dramatically improve performance, they come with increased cost and complexity.

The key is knowing when to apply which technique. Simple queries need simple prompts. Complex reasoning benefits from CoT. Multi-step tasks with tools require ReAct. High-stakes decisions may justify self-consistency or ToT.

As LLMs continue to improve, prompt engineering will evolve. But the fundamentals remain: clear instructions, appropriate examples, structured reasoning, and continuous testing and refinement.

Master these techniques, and you will unlock the full potential of language models in your applications.


Key Takeaways

  • Few-shot prompting dramatically improves performance for custom tasks.
  • Chain-of-Thought enables step-by-step reasoning by adding "Let's think step by step."
  • Self-consistency improves accuracy by generating multiple reasoning paths and voting.
  • ReAct combines reasoning with tool use for multi-step problem solving.
  • Tree-of-Thoughts explores multiple solution paths but is expensive.
  • Choose techniques based on task complexity, accuracy requirements, and budget.
  • Version and A/B test your prompts in production.
  • Start simple and add complexity only when necessary.
  • Always handle edge cases and monitor performance metrics.
  • Prompt engineering is iterativeβ€”test, measure, refine continuously.

Related Articles