Structured Outputs in LLMs: JSON Mode, Function Calling, and Schema Validation

Introduction

Language models are built to generate fluent text. They are excellent at writing, explaining, and conversing. But most production applications do not want a paragraph of prose — they want data they can parse, store in a database, or pass to another system.

Consider these real use cases:

Extract a customer's name, email, and order number from a support email.
Parse an invoice and return a list of line items with prices.
Classify a support ticket into a predefined category.
Generate an API call from a natural language command.

All of these require the LLM to produce output in a specific, machine-readable format — typically JSON. The problem is that LLMs are probabilistic text generators, not compilers. If you just ask "return the answer as JSON", the model might produce valid JSON most of the time — but in production, even a 2% failure rate means thousands of errors per day.

This article explains why free-form output is unreliable and walks through four progressively more robust approaches to structured output: prompt engineering, JSON mode, function calling, and grammar-constrained generation.

Why Structured Outputs Are Hard

When you ask a language model to produce JSON, it generates text token by token. It does not have an internal JSON parser checking its work — it is predicting the next most probable token based on everything before it.

This leads to common failures:

Syntax errors: Missing commas, unmatched brackets, trailing commas where not allowed.
Type mismatches: Returning "42" (a string) when the schema expects 42 (an integer).
Missing required fields: The model omits keys that your application depends on.
Extra fields: The model adds fields not in your schema, breaking strict parsers.
Wrapper text: The model includes "Here is the JSON you requested:" before the actual JSON block.

Each approach below adds a stronger guarantee, at the cost of some additional implementation effort.

Approach 1: Prompt Engineering for Structured Output

The simplest method is careful prompt design: explicitly tell the model to return only valid JSON with no additional text, and show it exactly what schema you expect.

Example prompt

Extract the following information from the text and return it as JSON.
Do not include any explanatory text. Only return valid JSON.

Schema:
{
  "name": string,
  "email": string,
  "phone": string,
  "issue": string
}

Text: "Hi, I'm Sarah Chen. My email is sarah.chen@example.com and my phone is 555-0123. I can't log into my account."

JSON:

Strengths

Works with any LLM, including older models without special API features.
No special configuration required.
Simple to implement and understand.

Weaknesses

Still prone to syntax errors with complex schemas.
No guarantee the output is parseable JSON.
The model often adds "Here is the JSON:" before the output anyway.

Use this for prototyping and simple cases. For anything in production, use one of the stronger approaches below.

Approach 2: JSON Mode

Modern LLM APIs offer JSON mode as a built-in feature. When enabled, the API constrains the model so it can only produce syntactically valid JSON.

How JSON mode works

At each generation step, the model produces a probability distribution over all possible next tokens (called logits). In JSON mode, the API filters this distribution to zero out any token that would produce invalid JSON at the current position. For example, after generating {"name": "Alice", the token "email" without a comma before it would be invalid — so it gets zeroed out and the model is forced to generate a comma first.

The result: the output is always syntactically valid JSON. The model still decides what keys and values to include, but the structure itself is guaranteed to be parseable.

OpenAI JSON mode example

import openai

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant that outputs valid JSON."
        },
        {
            "role": "user",
            "content": "Extract name, email, and phone from: 'Contact John Doe at john@example.com or 555-1234'"
        }
    ],
    response_format={"type": "json_object"}
)

print(response.choices[0].message.content)

Output:

{
  "name": "John Doe",
  "email": "john@example.com",
  "phone": "555-1234"
}

Anthropic tool use example

import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract person info as JSON: 'Jane Smith, jane@test.com, 555-9999'"
        }
    ],
    # Note: Anthropic uses different method - prompt engineering with tool use
)

Limitations of JSON mode

Guarantees syntactically valid JSON — but not that the JSON matches your schema.
The model can still omit required fields or use wrong data types.
You still need to validate the structure after receiving it.

Think of JSON mode as solving the "will this parse?" problem, not the "is this correct?" problem.

Approach 3: Function Calling / Tool Use

Function calling (also called tool use) is the most robust way to get structured output that matches a specific schema. Instead of asking the model to produce JSON, you define a function schema upfront — with typed fields and required fields — and the API enforces that the model's output conforms to it exactly.

How function calling works

You provide a function definition with typed parameters (like a form the model must fill in). The LLM API uses constrained decoding — similar to JSON mode but more precise — to ensure the output satisfies the schema, including required fields and type constraints.

OpenAI function calling example

import openai

# Define the function schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_contact_info",
            "description": "Extract contact information from text",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "Full name of the person"
                    },
                    "email": {
                        "type": "string",
                        "description": "Email address"
                    },
                    "phone": {
                        "type": "string",
                        "description": "Phone number"
                    }
                },
                "required": ["name", "email"]
            }
        }
    }
]

# Call the API
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "Extract info: 'Bob Johnson, bob@company.com, 555-7890'"
        }
    ],
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "extract_contact_info"}}
)

# Extract the function call
tool_call = response.choices[0].message.tool_calls[0]
arguments = json.loads(tool_call.function.arguments)
print(arguments)

Output:

{
  "name": "Bob Johnson",
  "email": "bob@company.com",
  "phone": "555-7890"
}

Anthropic tool use example

import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

# Define the tool schema
tools = [
    {
        "name": "extract_contact",
        "description": "Extract contact information",
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "email": {"type": "string"},
                "phone": {"type": "string"}
            },
            "required": ["name", "email"]
        }
    }
]

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    messages=[
        {
            "role": "user",
            "content": "Extract: 'Alice Brown, alice@example.com, 555-1111'"
        }
    ]
)

# Extract tool use
tool_use = next(block for block in message.content if block.type == "tool_use")
print(tool_use.input)

When to use function calling

Production systems where reliability is critical.
Applications that need to trigger API calls or database writes.
Agentic workflows where the model chooses from multiple tools.
Any scenario where required fields must always be present.

Approach 4: Pydantic for Schema Validation

Pydantic is a Python library for data validation. Even with JSON mode or function calling, you should validate the model's output before using it in your application — the model can still hallucinate values that are syntactically correct but semantically wrong (e.g., fabricating an email address).

Defining a Pydantic model

from pydantic import BaseModel, EmailStr, validator

class ContactInfo(BaseModel):
    name: str
    email: EmailStr
    phone: str

    @validator('phone')
    def validate_phone(cls, v):
        # Simple phone validation
        if not v or len(v) < 10:
            raise ValueError('Phone number must be at least 10 digits')
        return v

# Parse and validate LLM output
llm_output = '{"name": "Alice", "email": "alice@example.com", "phone": "555-1234567"}'
contact = ContactInfo.parse_raw(llm_output)
print(contact)

If the LLM produces invalid data — wrong email format, missing field, wrong type — Pydantic raises a ValidationError with details about what failed. You can then retry or fall back gracefully.

Integrating Pydantic with LLM calls

import openai
from pydantic import BaseModel, EmailStr
import json

class ContactInfo(BaseModel):
    name: str
    email: EmailStr
    phone: str

def extract_contact(text: str) -> ContactInfo:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "system",
                "content": f"Extract contact info as JSON matching this schema: {ContactInfo.schema_json()}"
            },
            {"role": "user", "content": text}
        ],
        response_format={"type": "json_object"}
    )

    json_output = response.choices[0].message.content
    return ContactInfo.parse_raw(json_output)

# Usage
result = extract_contact("Contact: Sarah Lee, sarah.lee@test.com, 555-9999")
print(result.name)  # "Sarah Lee"
print(result.email)  # "sarah.lee@test.com"

Automatic retry on validation failure

When validation fails, you can feed the error back to the model and ask it to try again. This dramatically improves success rates:

from pydantic import ValidationError

def extract_with_retry(text: str, max_retries=3) -> ContactInfo:
    for attempt in range(max_retries):
        try:
            response = openai.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": "Extract as JSON"},
                    {"role": "user", "content": text}
                ],
                response_format={"type": "json_object"}
            )

            json_output = response.choices[0].message.content
            return ContactInfo.parse_raw(json_output)

        except ValidationError as e:
            if attempt == max_retries - 1:
                raise
            # Retry with error feedback
            text = f"{text}\n\nPrevious attempt failed: {str(e)}. Please fix and try again."

    raise Exception("Failed after all retries")

Approach 5: Grammar-Based Constrained Generation

For self-hosted models where you control the inference engine, you can enforce output format using a formal grammar. This is the most powerful technique — it guarantees zero syntax errors, not just for JSON but for any format you can define with a grammar.

How it works

A grammar defines the exact rules for valid output (e.g., "a JSON object must start with {, followed by key-value pairs, ended with }"). At each generation step, the inference engine masks out any token that would violate the grammar at the current position. Only valid continuations are possible.

Rooted tree graph showing hierarchical parent-child node relationships between production rules — **Figure:** A context-free grammar is a tree of production rules — each non-terminal (such as `object` or `members`) expands into child rules, with leaf nodes representing the allowed terminal tokens. Grammar-constrained decoding traverses this parse tree at every generation step, masking any token that would leave a valid path through the tree. Source: ZeroOne / Wikimedia Commons (Public Domain)

llama.cpp grammar example

The llama.cpp library supports grammar-based generation using GBNF (GGML BNF) format — a way to define the rules of valid output:

# Define a JSON grammar for contact info
root ::= object
object ::= "{" ws members ws "}"
members ::= member ("," ws member)*
member ::= "\"name\"" ws ":" ws string | "\"email\"" ws ":" ws string | "\"phone\"" ws ":" ws string
string ::= "\"" ([^"\\] | "\\" .)* "\""
ws ::= [ \t\n]*

This grammar ensures the model can only generate JSON with exactly the fields you specify.

Outlines library (Python)

The Outlines library provides grammar-based generation for Hugging Face models in Python:

from outlines import models, generate

# Load model
model = models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

# Define JSON schema
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "email": {"type": "string"},
        "phone": {"type": "string"}
    },
    "required": ["name", "email"]
}

# Generate with schema constraint
generator = generate.json(model, schema)
result = generator("Extract contact info: 'Tom White, tom@company.com, 555-4444'")
print(result)

When to use grammar constraints

Self-hosted open-source models where you control inference.
Custom output formats beyond JSON (SQL, code, domain-specific languages).
Situations requiring zero syntax error rates.

Comparison: Structured Output Methods

Method	Syntax Guarantee	Schema Enforcement	Ease of Use	Best For
Prompt Engineering	No	No	Easy	Simple cases, prototyping
JSON Mode	Yes	No	Easy	General JSON output
Function Calling	Yes	Yes	Medium	Production apps, agents
Pydantic Validation	Post-check	Yes	Medium	Validation layer on top of any method
Grammar Constraints	Yes	Yes	Hard	Custom formats, self-hosted models

Handling Nested and Complex Schemas

Real-world applications often need nested objects and arrays. Function calling handles these naturally:

from pydantic import BaseModel
from typing import List

class LineItem(BaseModel):
    product: str
    quantity: int
    price: float

class Order(BaseModel):
    customer_name: str
    items: List[LineItem]
    total: float

# Function calling schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_order",
            "parameters": {
                "type": "object",
                "properties": {
                    "customer_name": {"type": "string"},
                    "items": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "product": {"type": "string"},
                                "quantity": {"type": "integer"},
                                "price": {"type": "number"}
                            },
                            "required": ["product", "quantity", "price"]
                        }
                    },
                    "total": {"type": "number"}
                },
                "required": ["customer_name", "items", "total"]
            }
        }
    }
]

Error Handling and Retry Strategies

Even with function calling, errors occur. The model might extract wrong values or fail to find required information. The pattern below feeds the validation error back to the model as context for the retry, which significantly improves success rates:

def extract_with_feedback(text: str, schema, max_attempts=3):
    conversation = [
        {"role": "system", "content": "Extract structured data"},
        {"role": "user", "content": text}
    ]

    for attempt in range(max_attempts):
        response = openai.chat.completions.create(
            model="gpt-4",
            messages=conversation,
            tools=[schema],
            tool_choice={"type": "function", "function": {"name": schema["function"]["name"]}}
        )

        try:
            tool_call = response.choices[0].message.tool_calls[0]
            arguments = json.loads(tool_call.function.arguments)

            # Validate with Pydantic
            validated = YourPydanticModel(**arguments)
            return validated

        except (json.JSONDecodeError, ValidationError) as e:
            # Add error feedback to conversation
            conversation.append({
                "role": "assistant",
                "content": f"Error: {str(e)}"
            })
            conversation.append({
                "role": "user",
                "content": "Please fix the error and try again."
            })

    raise Exception("Failed after all retries")

Security Considerations

Never let user input modify your schema

Do not allow user input to change your function definitions or JSON schemas. An attacker could inject schema modifications that expose sensitive fields or bypass validation rules.

Always validate before acting on extracted data

LLMs can hallucinate values that pass syntax checks but are semantically wrong — a fabricated email address, an invented product code. Always validate extracted data against your real data sources before writing to databases or calling APIs.

SQL generation requires extra care

Constrained generation enforces syntax shape only — it does not prevent SQL injection. A schema-valid string like '; DROP TABLE users;-- passes all syntax constraints. Always use parameterized queries and dedicated input sanitization independently of structured output validation.

Best Practices for Production

Use function calling for critical workflows that require reliability.
Always layer Pydantic validation on top — even with function calling, validate before using the data.
Implement retry logic with error feedback to the model.
Log all structured outputs for debugging and auditing.
Monitor extraction success rates and alert when they drop.
Test your schemas against edge cases: empty inputs, malformed text, missing information.
Version your schemas and track changes — schema changes can break downstream consumers silently.

Conclusion

Structured outputs transform LLMs from text generators into reliable data extraction tools. For production applications, the right combination is: function calling for API-level schema enforcement, Pydantic validation for business logic checks, and retry loops with error feedback for robustness.

JSON mode alone is sufficient for simple prototypes. Function calling is the gold standard for anything that matters. Grammar-constrained generation is worth the extra effort for self-hosted models or non-JSON formats.

Key Takeaways

JSON mode guarantees syntactically valid output but not schema compliance — for production reliability, use function calling, which enforces required fields and types at the API level.
Always layer Pydantic validation on top of function calling: the model can still hallucinate correct-looking values that fail business logic checks.
Implement retry loops with error feedback — feeding the validation failure back to the model in the next turn dramatically improves success rates.
For self-hosted models or non-JSON formats (SQL, custom DSLs), grammar-based constrained generation via llama.cpp or Outlines provides zero-error-rate syntax guarantees — but constrained syntax is not the same as safe input; always use parameterized queries separately.

References

OpenAI (2023). Function Calling and JSON Mode. platform.openai.com
Willard, B. T., & Louf, R. (2023). Efficient Guided Generation for Large Language Models. arXiv:2307.09702.
Pydantic Documentation — Data Validation for Python
Outlines — Structured Text Generation Library
Chase, H. (2023). LangChain Output Parsers. python.langchain.com

Advanced Prompt Engineering: Chain-of-Thought, ReAct, and Tree-of-Thoughts Explained

Chain-of-thought improves multi-step reasoning. ReAct adds tool use. Tree-of-thoughts explores multiple solution...

LLM Observability: Tracing, Logging, and Debugging AI Applications

You can't debug what you can't trace. Setting up prompt logging, span...

Found this useful?

Structured Outputs in LLMs: JSON Mode, Function Calling, and Schema Validation

Introduction

Why Structured Outputs Are Hard

Approach 1: Prompt Engineering for Structured Output

Example prompt

Strengths

Weaknesses

Approach 2: JSON Mode

How JSON mode works

OpenAI JSON mode example

Anthropic tool use example

Limitations of JSON mode

Approach 3: Function Calling / Tool Use

How function calling works

OpenAI function calling example

Anthropic tool use example

When to use function calling

Approach 4: Pydantic for Schema Validation

Defining a Pydantic model

Integrating Pydantic with LLM calls

Automatic retry on validation failure

Approach 5: Grammar-Based Constrained Generation

How it works

llama.cpp grammar example

Outlines library (Python)

When to use grammar constraints

Comparison: Structured Output Methods

Handling Nested and Complex Schemas

Error Handling and Retry Strategies

Security Considerations

Never let user input modify your schema

Always validate before acting on extracted data

SQL generation requires extra care

Best Practices for Production

Conclusion

Key Takeaways

References

Related Articles