OpenAI Agents SDK vs Anthropic SDK: A Technical Comparison

Introduction

AI agents running in production are no longer experimental. Two frameworks dominate: OpenAI's Agents SDK (a dedicated Python library built around minimal, composable abstractions) and Anthropic's Python SDK, which treats tool-calling and multi-agent coordination as first-class primitives inside the core anthropic package. Both are production-ready. Both support multi-agent orchestration, tool use, and structured outputs. But they differ fundamentally in philosophy, API design, and what they make easy versus what they leave to you.

This post compares them code-to-code across every dimension that matters in production: tool definition, agent loops, multi-agent handoffs, guardrails, memory, and observability.

1. Philosophy: Explicit Orchestration vs Built-in Abstractions

The two SDKs make opposite bets about how much structure to provide.

OpenAI Agents SDK ships a formal object model: Agent, Runner, Handoff, Guardrail. The framework owns the agent loop. You declare what the agent is; the SDK decides how to run it. This reduces boilerplate and makes common patterns like multi-agent routing and input validation near-zero-config.

Anthropic Python SDK stays closer to the raw API. There is no Agent class, no Runner. You get client.messages.create() with tools, and you write the agentic loop yourself: calling the API, handling tool_use blocks, appending results, and looping until stop_reason == "end_turn". More boilerplate, but total control over every decision point.

Figure 1: OpenAI Agents SDK (left) wraps a formal object model around an SDK-owned runner. Anthropic's Python SDK (right) exposes the raw API with rich primitives, but you write the agent loop yourself.

2. Core Primitives at a Glance

Primitive	OpenAI Agents SDK	Anthropic Python SDK
Agent definition	`Agent(name, instructions, tools, handoffs)`	No class; pass `system` prompt and `tools` list to each API call
Tool definition	`@tool` decorator; schema auto-inferred from type hints and docstring	JSON Schema dict with `name`, `description`, `input_schema`
Agent loop	`Runner.run()`: SDK-owned, handles retries and handoffs automatically	Manual `while stop_reason == "tool_use"` loop; you write every step
Multi-agent routing	Built-in `Handoff`: transfers control to another `Agent`	Manual: route via a dispatcher tool's return value
Guardrails	`@input_guardrail` / `@output_guardrail`: run in parallel to agent	Manual: validate before and after the API call in your loop
Streaming	`Runner.run_streamed()` with async event callbacks	`with client.messages.stream() as stream:`
Prompt caching	Not built-in; full token cost on every call	Native: `cache_control: {type: ephemeral}` on any content block
Extended thinking	Via o3/o4-mini reasoning; controlled by model choice	Native: `thinking={type: enabled, budget_tokens: N}`
Tracing	Built-in, auto-uploaded to OpenAI dashboard	Manual: integrate Langfuse, LangSmith, or custom logging
Model lock-in	OpenAI models only (GPT-4o, o3, o4-mini)	Claude models only (claude-opus-4-7, claude-sonnet-4-6, etc.)

3. Defining Tools

Tool definition is where the two SDKs diverge most visibly in day-to-day use.

3.1 OpenAI Agents SDK — `@tool` decorator

The SDK inspects the function's type annotations and docstring to generate the JSON Schema automatically. No manual schema writing required.


# pip install openai-agents
from agents import Agent, Runner, tool

@tool
def get_stock_price(ticker: str, currency: str = "USD") -> str:
    """
    Retrieve the current stock price for a given ticker symbol.
    Returns the price and percentage change for today.
    """
    prices = {"AAPL": 213.40, "GOOG": 178.20, "TSLA": 245.80}
    price = prices.get(ticker.upper(), 0.0)
    return f"{ticker.upper()}: {currency} {price:.2f} (+1.2% today)"

@tool
def get_company_news(ticker: str, limit: int = 3) -> str:
    """Get the latest news headlines for a company by stock ticker."""
    return f"Top {limit} headlines for {ticker}: [headlines would appear here]"

analyst_agent = Agent(
    name="StockAnalyst",
    instructions=(
        "You are a financial analyst. Use tools to get real-time data "
        "before answering any questions about stocks or markets."
    ),
    tools=[get_stock_price, get_company_news],
)

result = Runner.run_sync(analyst_agent, "What is Apple's stock price and latest news?")
print(result.final_output)

The @tool decorator reads the type hints and default values (currency: str = "USD") and converts them into the JSON Schema the model receives. The docstring becomes the tool description. You write a normal Python function; the SDK handles schema translation.

3.2 Anthropic Python SDK — JSON Schema dict

With the Anthropic SDK you write the schema explicitly. More verbose, but you gain full control over descriptions, required fields, and constraints that Python type hints cannot express, such as enum values or numeric ranges.


# pip install anthropic
import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_stock_price",
        "description": (
            "Retrieve the current stock price for a given ticker symbol. "
            "Returns the price and percentage change for today."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {
                    "type": "string",
                    "description": "Stock ticker symbol, e.g. AAPL, GOOG, TSLA"
                },
                "currency": {
                    "type": "string",
                    "description": "Currency for the price",
                    "enum": ["USD", "EUR", "GBP", "MYR"],
                    "default": "USD"
                }
            },
            "required": ["ticker"]
        }
    },
    {
        "name": "get_company_news",
        "description": "Get the latest news headlines for a company by stock ticker.",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {"type": "string"},
                "limit": {
                    "type": "integer",
                    "description": "Number of headlines to return (1–10)",
                    "minimum": 1,
                    "maximum": 10,
                    "default": 3
                }
            },
            "required": ["ticker"]
        }
    }
]

The enum constraint on currency and minimum/maximum on limit enforce valid inputs at the schema level; the model will never produce an out-of-range value. These constraints require manual schema entries in any framework, including OpenAI's.

4. The Agent Loop

The agentic loop (call the model, execute tools, feed results back, repeat until completion) is where the two SDKs differ most fundamentally in structure.

4.1 OpenAI Agents SDK — Runner owns the loop


from agents import Agent, Runner, tool

@tool
def search_knowledge_base(query: str) -> str:
    """Search the internal knowledge base and return matching documents."""
    return f"Found 3 documents matching '{query}'"

agent = Agent(
    name="ResearchAssistant",
    instructions="You are a research assistant. Search the knowledge base before answering.",
    tools=[search_knowledge_base],
)

# Runner.run_sync() handles the entire loop internally:
# 1. Calls the model with the agent's instructions and tools
# 2. Detects tool_use blocks in the response
# 3. Executes each tool
# 4. Appends tool results and calls the model again
# 5. Repeats until stop_reason == "end_turn"
# 6. Returns RunResult with final_output, all messages, and trace
result = Runner.run_sync(agent, "What does our documentation say about rate limiting?")

print(result.final_output)       # the model's final text response
print(len(result.new_items))  # all new conversation items including tool call turns

4.2 Anthropic Python SDK — you own the loop


import anthropic

client = anthropic.Anthropic()

SYSTEM = "You are a research assistant. Search the knowledge base before answering."

tools = [
    {
        "name": "search_knowledge_base",
        "description": "Search the internal knowledge base and return matching documents.",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }
]

def execute_tool(name: str, tool_input: dict) -> str:
    if name == "search_knowledge_base":
        return f"Found 3 documents matching '{tool_input['query']}'"
    return f"Unknown tool: {name}"

def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=4096,
            system=SYSTEM,
            tools=tools,
            messages=messages,
        )

        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return ""

        # Collect every tool_use block from this turn
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result_text = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result_text,
                })

        # Append the full assistant turn then the tool results, then loop
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

answer = run_agent("What does our documentation say about rate limiting?")
print(answer)

The manual loop is roughly 30 lines of boilerplate, but every decision point is visible and modifiable. You can inject logging, token counting, caching, custom retry logic, or circuit breakers at any step, without fighting an abstraction layer.

4.3 Streaming responses

Both SDKs support streaming so the first tokens reach the user in milliseconds rather than waiting for the full response. The implementation differs significantly.

OpenAI Agents SDK: use Runner.run_streamed() instead of Runner.run(). It returns a RunResultStreaming object you iterate with async for. Two event types matter in practice: raw_response_event carries raw token deltas; run_item_stream_event fires when a complete item (tool call, tool result, or message) is finished.


from agents import Agent, Runner, tool
import asyncio

@tool
def lookup(query: str) -> str:
    """Search the knowledge base."""
    return f"Found 3 results for '{query}'"

agent = Agent(
    name="StreamAgent",
    instructions="Search before answering.",
    tools=[lookup],
)

async def stream_agent(user_message: str) -> str:
    result = Runner.run_streamed(agent, user_message)

    async for event in result.stream_events():
        if event.type == "raw_response_event":
            delta = getattr(event.data, "delta", None)
            if delta and hasattr(delta, "text") and delta.text:
                print(delta.text, end="", flush=True)

    print()
    return result.final_output

asyncio.run(stream_agent("What does our documentation say about rate limits?"))

Anthropic Python SDK: use client.messages.stream() as a context manager. The stream.text_stream async generator yields only text deltas. Call stream.get_final_message() after the block to get the complete response for tool handling. The agent loop structure stays the same; only the inner call changes.


import anthropic, asyncio

aclient = anthropic.AsyncAnthropic()

async def run_agent_streamed(user_message: str, system: str, tools: list) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        async with aclient.messages.stream(
            model="claude-opus-4-7",
            max_tokens=2048,
            system=system,
            tools=tools,
            messages=messages,
        ) as stream:
            async for text in stream.text_stream:
                print(text, end="", flush=True)     # tokens arrive as they're generated
            response = await stream.get_final_message()

        if response.stop_reason == "end_turn":
            print()
            return "".join(b.text for b in response.content if hasattr(b, "text"))

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                tool_results.append({
                    "type":        "tool_result",
                    "tool_use_id": block.id,
                    "content":     f"result for {block.name}",
                })
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user",      "content": tool_results})

4.4 Error handling and retries

Production agents hit API errors, tool failures, and runaway loops. The two SDKs handle these at very different levels of abstraction.

The OpenAI Agents SDK catches most transient failures internally. If a tool raises an exception, the framework feeds the error text back to the model as a tool result and continues. You configure bounds with max_turns on Runner.run() and catch SDK-level exceptions at the call site.


from agents import (
    Agent, Runner,
    MaxTurnsExceeded,
    InputGuardrailTripwireTriggered,
    OutputGuardrailTripwireTriggered,
)

agent = Agent(name="SafeAgent", instructions="Be helpful.")

try:
    result = Runner.run_sync(agent, user_message, max_turns=15)
    print(result.final_output)

except InputGuardrailTripwireTriggered:
    print("Request blocked by input guardrail")

except OutputGuardrailTripwireTriggered:
    print("Response blocked by output guardrail")

except MaxTurnsExceeded:
    print("Agent exceeded 15 turns — possible loop, aborting")

With the Anthropic SDK you handle everything yourself. Use tenacity or a simple retry wrapper for transient HTTP errors, check stop_reason for unexpected model termination, and set an explicit turn counter to break runaway loops.


import anthropic, time

client = anthropic.Anthropic()

def run_with_retry(messages: list, system: str, tools: list, max_turns: int = 15) -> str:
    turns = 0

    while turns < max_turns:
        for attempt in range(3):
            try:
                response = client.messages.create(
                    model="claude-opus-4-7",
                    max_tokens=2048,
                    system=system,
                    tools=tools,
                    messages=messages,
                )
                break
            except anthropic.RateLimitError:
                time.sleep(60 * (attempt + 1))   # 60s, 120s, 180s back-off
            except anthropic.APIStatusError as e:
                if e.status_code == 529:          # Anthropic overloaded
                    time.sleep(30)
                else:
                    raise
        else:
            raise RuntimeError("API unavailable after 3 retries")

        if response.stop_reason == "end_turn":
            return "".join(b.text for b in response.content if hasattr(b, "text"))

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                try:
                    result_text = execute_tool(block.name, block.input)
                except Exception as exc:
                    result_text = f"Tool error: {exc}"   # feed error back to model
                tool_results.append({
                    "type":        "tool_result",
                    "tool_use_id": block.id,
                    "content":     result_text,
                })

        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user",      "content": tool_results})
        turns += 1

    raise RuntimeError(f"Agent exceeded {max_turns} turns without completing")

5. Multi-Agent Orchestration

Both SDKs support multi-agent patterns, but the abstractions are architecturally different.

5.1 OpenAI Agents SDK — Handoffs

A handoff is a specialised tool call where the model transfers control from the active agent to a target agent. When the Runner detects a handoff, it replaces the active agent and continues the loop with the new agent. Conversation history is preserved across the handoff. The specialist sees exactly what the triage agent saw.

Figure 2: OpenAI Agents SDK multi-agent pattern. The triage agent selects a specialist via a handoff() call; the Runner transfers control and continues the loop with the specialist, conversation history intact.


from agents import Agent, Runner, tool, handoff

@tool
def lookup_invoice(invoice_id: str) -> str:
    """Look up an invoice by ID and return its payment status."""
    return f"Invoice {invoice_id}: paid on 2026-04-15, amount USD 49.00"

@tool
def create_support_ticket(description: str, priority: str = "medium") -> str:
    """Create a support ticket for a technical issue."""
    return f"Ticket #TK-{abs(hash(description)) % 10000} created ({priority} priority)"

billing_agent = Agent(
    name="BillingAgent",
    instructions="You handle billing, invoices, and payment queries. Be concise.",
    tools=[lookup_invoice],
)

support_agent = Agent(
    name="TechSupportAgent",
    instructions="You handle technical issues and bug reports.",
    tools=[create_support_ticket],
)

triage_agent = Agent(
    name="TriageAgent",
    instructions=(
        "You are the first point of contact. Identify the nature of the user's "
        "request and hand off to the correct specialist. "
        "Use BillingAgent for payment or invoice issues. "
        "Use TechSupportAgent for bugs or technical problems."
    ),
    handoffs=[handoff(billing_agent), handoff(support_agent)],
)

result = Runner.run_sync(triage_agent, "My invoice #1234 is showing the wrong amount.")
print(result.final_output)
print(f"Active agent at end: {result.last_agent.name}")  # → BillingAgent

5.2 Anthropic SDK — routing via tool return values

There is no Handoff class in the Anthropic SDK. You implement routing yourself: a dispatcher tool returns the name of the target agent, and your loop switches the system prompt and tool set accordingly before the next call.


import anthropic

client = anthropic.Anthropic()

AGENTS = {
    "billing": {
        "system": "You handle billing and invoice queries. Be concise.",
        "tools": [
            {
                "name": "lookup_invoice",
                "description": "Look up an invoice by ID.",
                "input_schema": {
                    "type": "object",
                    "properties": {"invoice_id": {"type": "string"}},
                    "required": ["invoice_id"]
                }
            }
        ]
    },
    "support": {
        "system": "You handle technical issues and bug reports.",
        "tools": [
            {
                "name": "create_ticket",
                "description": "Create a support ticket.",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "priority": {"type": "string", "enum": ["low", "medium", "high"]}
                    },
                    "required": ["description"]
                }
            }
        ]
    }
}

TRIAGE_TOOLS = [
    {
        "name": "route_to_agent",
        "description": "Route the conversation to the correct specialist agent.",
        "input_schema": {
            "type": "object",
            "properties": {
                "agent": {
                    "type": "string",
                    "enum": ["billing", "support"],
                    "description": "Which specialist to route to"
                }
            },
            "required": ["agent"]
        }
    }
]

def run_multi_agent(user_message: str) -> str:
    # Phase 1: triage — identify which specialist to use
    triage_resp = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        system="Identify the user's intent and call route_to_agent with the correct specialist.",
        tools=TRIAGE_TOOLS,
        messages=[{"role": "user", "content": user_message}],
    )

    target_agent = "support"
    for block in triage_resp.content:
        if block.type == "tool_use" and block.name == "route_to_agent":
            target_agent = block.input["agent"]

    # Phase 2: run the selected specialist agent
    config = AGENTS[target_agent]
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=2048,
            system=config["system"],
            tools=config["tools"],
            messages=messages,
        )
        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return ""

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"Executed {block.name} with {block.input}"
                })
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

6. Guardrails

Production agents need to validate what goes in and what comes out. The two SDKs handle this completely differently.

6.1 OpenAI Agents SDK — parallel guardrails

Guardrails run in parallel to the agent's first model call. The SDK fires the guardrail function while the LLM request is in flight. If the guardrail trips (tripwire_triggered=True), the runner raises InputGuardrailTripwireTriggered before any tool can execute. No wasted round-trip, no side effects.

Figure 3: OpenAI Agents SDK guardrail pipeline. The guardrail runs in parallel to the first LLM call. If it trips, the exception is raised before any tool executes. No wasted compute or side effects.


from agents import (
    Agent, Runner, GuardrailFunctionOutput,
    input_guardrail, RunContextWrapper
)
from pydantic import BaseModel

class SafetyCheck(BaseModel):
    is_unsafe: bool
    reason: str

@input_guardrail
async def safety_guardrail(
    ctx: RunContextWrapper, agent: Agent, input: str
) -> GuardrailFunctionOutput:
    """Block prompts attempting security bypass or harmful actions."""
    blocked = ["hack", "exploit", "bypass security", "jailbreak"]
    triggered = any(kw in input.lower() for kw in blocked)
    return GuardrailFunctionOutput(
        output_info=SafetyCheck(
            is_unsafe=triggered,
            reason="Potential security bypass" if triggered else "OK"
        ),
        tripwire_triggered=triggered,
    )

agent = Agent(
    name="SafeAgent",
    instructions="You are a helpful assistant.",
    input_guardrails=[safety_guardrail],
)

try:
    result = Runner.run_sync(agent, "How do I exploit this SQL query?")
except Exception as e:
    print(f"Blocked by guardrail: {e}")
    # No LLM call was fully processed — the guardrail ran in parallel

6.2 Anthropic SDK — manual validation in your loop


import anthropic, re

client = anthropic.Anthropic()

BLOCKED = ["hack", "exploit", "bypass security", "jailbreak"]

def validate_input(text: str) -> None:
    if any(kw in text.lower() for kw in BLOCKED):
        raise ValueError("Input blocked: disallowed content detected")

def validate_output(text: str) -> None:
    if re.search(r'\b\d{16}\b', text):
        raise ValueError("Output blocked: possible card number in response")

def safe_run(user_message: str, system: str, tools: list) -> str:
    validate_input(user_message)   # input guardrail

    messages = [{"role": "user", "content": user_message}]
    while True:
        response = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=2048,
            system=system,
            tools=tools,
            messages=messages,
        )
        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    validate_output(block.text)   # output guardrail
                    return block.text
            return ""

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": "result here"
                })
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

7. Structured Outputs and Context Variables

Two capabilities that are built into the OpenAI Agents SDK but require manual implementation with Anthropic: enforcing a typed schema on the model's final response, and threading application state through the agent run without touching every tool signature.

7.1 OpenAI Agents SDK — `output_type` and `RunContextWrapper`

Pass a Pydantic model as output_type on the Agent. The Runner enforces that the model's final response matches the schema and retries automatically if it does not. The result is a fully typed Python object, not a string to parse.


from agents import Agent, Runner
from pydantic import BaseModel, Field

class TicketAnalysis(BaseModel):
    category:    str          # "billing", "technical", "general"
    urgency:     int = Field(ge=1, le=5)
    summary:     str
    next_action: str

classifier = Agent(
    name="TicketClassifier",
    instructions=(
        "Classify the support ticket. Set urgency 1 (low) to 5 (critical). "
        "Suggest a concrete next action for the support team."
    ),
    output_type=TicketAnalysis,     # enforced by the Runner
)

result = Runner.run_sync(classifier, "My payment failed three times in a row.")
analysis = result.final_output     # TicketAnalysis instance, fully typed

print(analysis.category)           # → billing
print(analysis.urgency)            # → 4
print(analysis.next_action)        # → Escalate to billing team, check payment processor logs

Context variables let you pass application state (user identity, database handles, feature flags) into every tool call without adding extra parameters to each tool's function signature. Tools declare a RunContextWrapper as their first argument.


from agents import Agent, Runner, tool
from agents.run_context import RunContextWrapper
from dataclasses import dataclass

@dataclass
class RequestContext:
    user_id:   str
    tenant_id: str
    locale:    str

@tool
def get_account_status(ctx: RunContextWrapper[RequestContext]) -> str:
    """Get the current account status for the authenticated user."""
    uid    = ctx.context.user_id
    tenant = ctx.context.tenant_id
    return f"Account {uid} (tenant {tenant}): active, 14 days remaining on trial"

support_agent = Agent(
    name="SupportAgent",
    instructions="Help the user with their account. Use the provided context for user identity.",
    tools=[get_account_status],
)

ctx    = RequestContext(user_id="u_9281", tenant_id="t_acme", locale="en-MY")
result = Runner.run_sync(support_agent, "What is my account status?", context=ctx)
print(result.final_output)

7.2 Anthropic Python SDK — forced tool call and closure-based state

The Anthropic SDK has no output_type parameter. The standard approach is to define a dedicated return tool and force the model to call it via tool_choice. The tool's input_schema becomes the output schema. This pattern is more verbose, but it gives you precise schema control including enum constraints and numeric ranges.


import anthropic

client = anthropic.Anthropic()

RETURN_TOOL = {
    "name": "return_ticket_analysis",
    "description": "Return the structured analysis of the support ticket.",
    "input_schema": {
        "type": "object",
        "properties": {
            "category":    {"type": "string", "enum": ["billing", "technical", "general"]},
            "urgency":     {"type": "integer", "minimum": 1, "maximum": 5},
            "summary":     {"type": "string"},
            "next_action": {"type": "string"}
        },
        "required": ["category", "urgency", "summary", "next_action"]
    }
}

def classify_ticket(ticket_text: str) -> dict:
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=512,
        system=(
            "Classify the support ticket. Set urgency 1 (low) to 5 (critical). "
            "Suggest a concrete next action for the support team."
        ),
        tools=[RETURN_TOOL],
        tool_choice={"type": "tool", "name": "return_ticket_analysis"},
        messages=[{"role": "user", "content": ticket_text}],
    )
    for block in response.content:
        if block.type == "tool_use" and block.name == "return_ticket_analysis":
            return block.input
    return {}

analysis = classify_ticket("My payment failed three times in a row.")
print(analysis["category"])    # → billing
print(analysis["urgency"])     # → 4

For context variables, Python closures or a shared dataclass passed by reference serve the same purpose as RunContextWrapper. Since you own the loop, you can reference any in-scope object from inside your tool implementations directly.


def build_tool_executor(user_id: str, tenant_id: str):
    """Return a tool-dispatch function that already has user context bound in."""

    def execute_tool(name: str, tool_input: dict) -> str:
        if name == "get_account_status":
            return f"Account {user_id} (tenant {tenant_id}): active, 14 days on trial"
        return f"Unknown tool: {name}"

    return execute_tool

# In your agent loop:
executor = build_tool_executor(user_id="u_9281", tenant_id="t_acme")
# Pass executor into run_agent() — tools see user_id and tenant_id without extra params

8. Prompt Caching: Anthropic's Structural Advantage

This is the single biggest practical difference between the two SDKs in production. Anthropic's prompt caching lets you mark any content block as cacheable with cache_control: {type: ephemeral}. The marked content is stored on Anthropic's infrastructure for 5 minutes. Subsequent requests sharing the same cached prefix are charged at 10% of normal input token cost and return significantly faster.

In an agent loop that replays the same long system prompt on every turn, this is transformative. A 10,000-token system prompt drops from paying 10,000 input tokens per call to paying 1,000 tokens (cached rate) on every repeated call after the first.


import anthropic

client = anthropic.Anthropic()

# A long, reusable system prompt — mark it for caching
SYSTEM_PROMPT = """
You are an expert financial analyst with deep knowledge of equity markets,
fixed income, derivatives, and macroeconomic indicators. You have access to
[... imagine 8,000 tokens of detailed domain knowledge and instructions ...]
""".strip()

def run_with_caching(user_message: str, tools: list) -> str:
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2048,
        system=[
            {
                "type": "text",
                "text": SYSTEM_PROMPT,
                "cache_control": {"type": "ephemeral"}  # cache this block
            }
        ],
        tools=tools,
        messages=[{"role": "user", "content": user_message}],
    )

    usage = response.usage
    cache_read    = getattr(usage, "cache_read_input_tokens", 0)
    cache_created = getattr(usage, "cache_creation_input_tokens", 0)
    print(f"Input tokens:        {usage.input_tokens}")
    print(f"Cache read tokens:   {cache_read}   ← charged at 10% rate")
    print(f"Cache created tokens:{cache_created} ← 25% surcharge on base price (one-time write)")

    for block in response.content:
        if hasattr(block, "text"):
            return block.text
    return ""

# First call:  cache_creation_input_tokens ≈ 8000 (charged at 1.25× base input price — one-time)
# All repeat calls within 5 min: cache_read_input_tokens ≈ 8000 (charged at 0.10× base — 90% off)
# Net saving on a busy API: ~60–90% of system prompt token cost across all repeated calls

The OpenAI Agents SDK has no equivalent. Every call to Runner.run() pays full input token cost for the system prompt on every turn.

8.2 Extended Thinking — Anthropic-exclusive deep reasoning

Extended thinking lets Claude reason through a problem in a private scratchpad before producing its final answer. You control the reasoning budget in tokens. The model uses the budget to explore multiple approaches, check its work, and self-correct. This is particularly valuable for multi-step coding problems, mathematical reasoning, and complex analysis tasks where a fast response is often wrong.

This has no equivalent in the OpenAI Agents SDK. o3 and o4-mini reason internally, but you cannot inspect the reasoning trace or set a token budget. With Claude you get both.


import anthropic

client = anthropic.Anthropic()

def solve_with_thinking(problem: str, thinking_budget: int = 10_000) -> dict:
    """
    Run Claude with extended thinking.
    Returns the reasoning trace and the final answer separately.
    """
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=thinking_budget + 2048,   # budget + room for the actual answer
        thinking={
            "type":          "enabled",
            "budget_tokens": thinking_budget,
        },
        messages=[{"role": "user", "content": problem}],
    )

    thinking_trace = ""
    answer         = ""

    for block in response.content:
        if block.type == "thinking":
            thinking_trace = block.thinking
        elif hasattr(block, "text"):
            answer = block.text

    return {
        "thinking":       thinking_trace,   # internal reasoning — inspect for debugging
        "answer":         answer,
        "input_tokens":   response.usage.input_tokens,
        "output_tokens":  response.usage.output_tokens,
    }

result = solve_with_thinking(
    "A ball is thrown upward at 20 m/s from a cliff 45 m high. "
    "When does it hit the ground, and what is its velocity at impact?",
    thinking_budget=6000,
)

print(result["answer"])
# Optionally log result["thinking"] for debugging — not shown to end users

In an agentic context, extended thinking is most useful when the agent needs to plan a complex multi-step workflow before issuing any tool calls. Enable it on the first LLM call in the loop, then disable it on subsequent tool-result turns to avoid paying the reasoning cost on every round-trip.


def run_agent_with_planning(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
    first_turn = True

    while True:
        kwargs = {
            "model":     "claude-opus-4-7",
            "max_tokens": 12_000,
            "system":    SYSTEM,
            "tools":     tools,
            "messages":  messages,
        }
        if first_turn:
            kwargs["thinking"] = {"type": "enabled", "budget_tokens": 8000}

        response = client.messages.create(**kwargs)
        first_turn = False

        if response.stop_reason == "end_turn":
            return "".join(b.text for b in response.content if hasattr(b, "text"))

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                tool_results.append({
                    "type":        "tool_result",
                    "tool_use_id": block.id,
                    "content":     execute_tool(block.name, block.input),
                })
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user",      "content": tool_results})

9. Tracing and Observability

9.1 OpenAI Agents SDK — built-in tracing

Every Runner.run() call generates a trace automatically. Traces appear in the OpenAI Traces dashboard with the full message history, tool calls, handoff decisions, guardrail results, token counts, and per-step latency. Zero configuration required. Custom spans can be added:


from agents import Agent, Runner, trace, custom_span

async def handle_request(user_message: str):
    with trace("support-session"):
        with custom_span("triage"):
            result = await Runner.run(triage_agent, user_message)
    # Dashboard shows: triage span → handoff event → specialist turn → final response
    return result.final_output

9.2 Anthropic SDK — bring your own observability

The Anthropic SDK logs nothing automatically. Langfuse is the most popular open-source integration:


from langfuse import Langfuse
import anthropic

langfuse = Langfuse()
client   = anthropic.Anthropic()

def run_with_tracing(user_message: str) -> str:
    trace = langfuse.trace(name="agent-run", input={"user": user_message})
    span  = trace.span(name="llm-call")

    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        messages=[{"role": "user", "content": user_message}],
    )
    output_text = response.content[0].text

    span.end(
        output={"response": output_text},
        usage={
            "input":  response.usage.input_tokens,
            "output": response.usage.output_tokens,
        }
    )
    trace.update(output={"response": output_text})
    return output_text

10. When to Choose Which

Situation	Recommendation	Reason
Fastest path to a working multi-agent system	OpenAI Agents SDK	Handoffs, guardrails, and tracing are zero-config
Minimise API costs at scale (millions of calls per day)	Anthropic SDK	Prompt caching cuts 60–90% of system prompt token cost
Extended thinking / chain-of-thought reasoning inside the agent	Anthropic SDK	Native `thinking` parameter with configurable token budget
Team already uses the OpenAI API throughout the stack	OpenAI Agents SDK	Same credentials, same dashboard, consistent developer experience
Need full control over the agent loop (custom retry, branching, streaming)	Anthropic SDK	You write the loop. No hidden framework behaviour to work around
Need parallel guardrails without adding latency	OpenAI Agents SDK	Guardrails run while the first LLM call is in flight, adding near-zero latency
Strongest reasoning on hard tasks	Evaluate both	GPT-5.5 vs Claude Opus 4.7: run your own benchmark on your task

11. Key Takeaways

OpenAI Agents SDK gives you structure. Agent, Runner, Handoff, Guardrail: declare your agents, hand the loop to the framework. Right when development speed matters more than fine-grained control.
Anthropic Python SDK gives you control. You write the agentic loop yourself. More boilerplate, but every step is transparent and modifiable. No framework surprises in production.
Prompt caching is Anthropic's biggest practical advantage. cache_control: ephemeral reduces system prompt costs by 60–90% on repeated calls. The OpenAI Agents SDK has no equivalent.
Guardrails are OpenAI's standout feature. Running validation in parallel to the first LLM call means safety checks add almost zero latency. A hard advantage for high-throughput systems.
Extended thinking is Anthropic-only. If you need the model to reason deeply before answering and want to control how many tokens it spends doing so, only the Anthropic SDK exposes this natively.
Tracing favours OpenAI on developer experience. Automatic upload to the dashboard versus integrating Langfuse or LangSmith manually.
Neither is model-agnostic. Both SDKs are locked to their respective model providers. If you need to swap models, build your own abstraction on top of both APIs.