Structured Outputs in LLMs: JSON Mode, Function Calling, and Schema Validation

How to extract reliable structured data from language models using JSON mode, grammar constraints, and schema enforcement

Posted by Perivitta on March 19, 2026 · 24 mins read
Understanding : A Step-by-Step Guide

Structured Outputs in LLMs: JSON Mode, Function Calling, and Schema Validation


Introduction

Language models are designed to generate natural language text. They excel at conversations, creative writing, and explanations. However, production applications rarely need just free-form text. They need structured data.

Consider these common use cases:

  • Extracting customer information from support tickets into a database.
  • Parsing invoices and returning line items as JSON.
  • Generating API requests from natural language commands.
  • Classifying content into predefined categories.
  • Building multi-step agents that call functions and tools.

All of these require the LLM to produce output in a specific, machine-readable format. Free-form text is not good enough. You need valid JSON, properly formatted function calls, or data that matches a strict schema.

This is harder than it sounds. LLMs are probabilistic text generators, not compilers. They can produce malformed JSON, miss required fields, or hallucinate extra keys.

This post explains how to reliably extract structured outputs from LLMs using JSON mode, function calling, grammar constraints, and schema validation techniques.


Why Structured Outputs Are Hard

When you ask a language model to produce JSON, it generates text token by token. It does not have an internal representation of JSON structure. It is predicting the next token based on probability distributions.

This leads to several problems:

  • Syntax errors: Missing commas, unmatched brackets, trailing commas where not allowed.
  • Type mismatches: Returning a string when the schema expects an integer.
  • Missing required fields: The model omits keys that your application depends on.
  • Extra fields: The model adds fields not in your schema, breaking strict parsers.
  • Wrapper text: The model includes explanatory text before or after the JSON block.

A simple prompt like "Return the answer as JSON" often works, but it is not reliable at scale. In production, even a 2% failure rate can cause thousands of errors daily.


Approach 1: Prompt Engineering for Structured Output

The simplest method is careful prompt design. You explicitly instruct the model to return only valid JSON with no additional text.

Example Prompt

Extract the following information from the text and return it as JSON.
Do not include any explanatory text. Only return valid JSON.
 
Schema:
{
  "name": string,
  "email": string,
  "phone": string,
  "issue": string
}
 
Text: "Hi, I'm Sarah Chen. My email is sarah.chen@example.com and my phone is 555-0123. I can't log into my account."
 
JSON:

Strengths

  • Works with any LLM, including older models.
  • No special API features required.
  • Simple to implement.

Weaknesses

  • Still prone to errors with complex schemas.
  • No guarantee of valid JSON syntax.
  • Requires manual schema description in the prompt.
  • Wrapper text issues (model adds "Here is the JSON:" before output).

This approach works for simple cases but is not production-grade for critical workflows.


Approach 2: JSON Mode

Modern LLM APIs now offer "JSON mode" as a first-class feature. When enabled, the model is constrained to produce only valid JSON.

How JSON Mode Works

JSON mode works by constraining the model's token generation. At each step, the model can only select tokens that keep the output valid JSON.

This is implemented at the inference level, not just through prompting. The model's logits (probability scores) are filtered to exclude tokens that would break JSON syntax.

For example, after generating {"name": "Alice", the model cannot generate a token that would create invalid syntax like {"name": "Alice" "email".

OpenAI JSON Mode Example

import openai
 
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant that outputs valid JSON."
        },
        {
            "role": "user",
            "content": "Extract name, email, and phone from: 'Contact John Doe at john@example.com or 555-1234'"
        }
    ],
    response_format={"type": "json_object"}
)
 
print(response.choices[0].message.content)

Output:

{
  "name": "John Doe",
  "email": "john@example.com",
  "phone": "555-1234"
}

Anthropic JSON Mode Example

import anthropic
 
client = anthropic.Anthropic(api_key="your-api-key")
 
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract person info as JSON: 'Jane Smith, jane@test.com, 555-9999'"
        }
    ],
    # Note: Anthropic uses different method - prompt engineering with tool use
)

Strengths of JSON Mode

  • Guarantees syntactically valid JSON.
  • No wrapper text or extra explanations.
  • Reduces parsing errors significantly.

Limitations of JSON Mode

  • Does not enforce schema compliance (field names, types, required fields).
  • You still need to validate the structure after receiving it.
  • The model can still hallucinate field values or miss required keys.

JSON mode solves syntax issues but not semantic correctness.


Approach 3: Function Calling / Tool Use

Function calling (also called tool use) is the most robust way to get structured outputs that match a specific schema.

Instead of asking the model to produce JSON, you define a function schema upfront. The model is then constrained to produce output that matches that schema exactly.

How Function Calling Works

You define a function signature with typed parameters. The LLM API enforces that the model's output conforms to this schema.

Under the hood, function calling uses constrained decoding similar to JSON mode, but with schema-level guarantees.

OpenAI Function Calling Example

import openai
 
# Define the function schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_contact_info",
            "description": "Extract contact information from text",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "Full name of the person"
                    },
                    "email": {
                        "type": "string",
                        "description": "Email address"
                    },
                    "phone": {
                        "type": "string",
                        "description": "Phone number"
                    }
                },
                "required": ["name", "email"]
            }
        }
    }
]
 
# Call the API
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "Extract info: 'Bob Johnson, bob@company.com, 555-7890'"
        }
    ],
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "extract_contact_info"}}
)
 
# Extract the function call
tool_call = response.choices[0].message.tool_calls[0]
arguments = json.loads(tool_call.function.arguments)
print(arguments)

Output:

{
  "name": "Bob Johnson",
  "email": "bob@company.com",
  "phone": "555-7890"
}

Anthropic Tool Use Example

import anthropic
 
client = anthropic.Anthropic(api_key="your-api-key")
 
# Define the tool schema
tools = [
    {
        "name": "extract_contact",
        "description": "Extract contact information",
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "email": {"type": "string"},
                "phone": {"type": "string"}
            },
            "required": ["name", "email"]
        }
    }
]
 
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    messages=[
        {
            "role": "user",
            "content": "Extract: 'Alice Brown, alice@example.com, 555-1111'"
        }
    ]
)
 
# Extract tool use
tool_use = next(block for block in message.content if block.type == "tool_use")
print(tool_use.input)

Strengths of Function Calling

  • Enforces schema compliance (required fields, types).
  • Guarantees syntactically correct JSON.
  • No post-processing or validation needed.
  • Works for multi-step tool use (agents).

When to Use Function Calling

  • Production systems where reliability is critical.
  • Applications that need to call APIs or databases.
  • Agentic workflows with multiple tool options.
  • Any scenario where schema validation is required.

Approach 4: Pydantic for Schema Validation

Even with JSON mode or function calling, you should validate the output against your expected schema. Pydantic is the standard Python library for this.

Defining a Pydantic Model

from pydantic import BaseModel, EmailStr, validator
 
class ContactInfo(BaseModel):
    name: str
    email: EmailStr
    phone: str
    
    @validator('phone')
    def validate_phone(cls, v):
        # Simple phone validation
        if not v or len(v) < 10:
            raise ValueError('Phone number must be at least 10 digits')
        return v
 
# Parse and validate LLM output
llm_output = '{"name": "Alice", "email": "alice@example.com", "phone": "555-1234567"}'
contact = ContactInfo.parse_raw(llm_output)
print(contact)

If the LLM produces invalid data, Pydantic raises a ValidationError with details about what failed.

Integrating Pydantic with LLM Calls

import openai
from pydantic import BaseModel, EmailStr
import json
 
class ContactInfo(BaseModel):
    name: str
    email: EmailStr
    phone: str
 
def extract_contact(text: str) -> ContactInfo:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "system",
                "content": f"Extract contact info as JSON matching this schema: {ContactInfo.schema_json()}"
            },
            {"role": "user", "content": text}
        ],
        response_format={"type": "json_object"}
    )
    
    json_output = response.choices[0].message.content
    return ContactInfo.parse_raw(json_output)
 
# Usage
result = extract_contact("Contact: Sarah Lee, sarah.lee@test.com, 555-9999")
print(result.name)  # "Sarah Lee"
print(result.email)  # "sarah.lee@test.com"

Automatic Retry on Validation Failure

from pydantic import ValidationError
 
def extract_with_retry(text: str, max_retries=3) -> ContactInfo:
    for attempt in range(max_retries):
        try:
            response = openai.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": "Extract as JSON"},
                    {"role": "user", "content": text}
                ],
                response_format={"type": "json_object"}
            )
            
            json_output = response.choices[0].message.content
            return ContactInfo.parse_raw(json_output)
            
        except ValidationError as e:
            if attempt == max_retries - 1:
                raise
            # Retry with error feedback
            text = f"{text}\n\nPrevious attempt failed: {str(e)}. Please fix and try again."
    
    raise Exception("Failed after all retries")

Approach 5: Grammar-Based Constrained Generation

For maximum control, you can use grammar-based constrained generation. This forces the model to follow a specific grammar (like JSON, SQL, or custom formats).

How It Works

Instead of filtering tokens after generation, the grammar is enforced during generation. At each step, only tokens that satisfy the grammar are allowed.

This is more powerful than JSON mode because you can define custom grammars beyond JSON.

llama.cpp Grammar Example

The llama.cpp library supports grammar-based generation using GBNF (GGML BNF) format.

# Define a JSON grammar for contact info
root ::= object
object ::= "{" ws members ws "}"
members ::= member ("," ws member)*
member ::= "\"name\"" ws ":" ws string | "\"email\"" ws ":" ws string | "\"phone\"" ws ":" ws string
string ::= "\"" ([^"\\] | "\\" .)* "\""
ws ::= [ \t\n]*

This grammar ensures the model can only generate JSON with the exact fields you specify.

Outlines Library (Python)

The Outlines library provides grammar-based generation for Hugging Face models.

from outlines import models, generate
 
# Load model
model = models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
 
# Define JSON schema
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "email": {"type": "string"},
        "phone": {"type": "string"}
    },
    "required": ["name", "email"]
}
 
# Generate with schema constraint
generator = generate.json(model, schema)
result = generator("Extract contact info: 'Tom White, tom@company.com, 555-4444'")
print(result)

When to Use Grammar Constraints

  • Self-hosted open-source models where you control inference.
  • Custom output formats beyond JSON (SQL, code, DSLs).
  • Maximum reliability requirements (0% syntax errors).

Comparison: Structured Output Methods

Method Syntax Guarantee Schema Enforcement Ease of Use Best For
Prompt Engineering No No Easy Simple cases, prototyping
JSON Mode Yes No Easy General JSON output
Function Calling Yes Yes Medium Production apps, agents
Pydantic Validation Post-check Yes Medium Validation layer
Grammar Constraints Yes Yes Hard Custom formats, self-hosted

Handling Nested and Complex Schemas

Real-world applications often need nested objects and arrays.

Example: Order Extraction

from pydantic import BaseModel
from typing import List
 
class LineItem(BaseModel):
    product: str
    quantity: int
    price: float
 
class Order(BaseModel):
    customer_name: str
    items: List[LineItem]
    total: float
 
# Function calling schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_order",
            "parameters": {
                "type": "object",
                "properties": {
                    "customer_name": {"type": "string"},
                    "items": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "product": {"type": "string"},
                                "quantity": {"type": "integer"},
                                "price": {"type": "number"}
                            },
                            "required": ["product", "quantity", "price"]
                        }
                    },
                    "total": {"type": "number"}
                },
                "required": ["customer_name", "items", "total"]
            }
        }
    }
]

The LLM will extract the order as a properly nested structure.


Error Handling and Retry Strategies

Even with function calling, errors can occur. The model might extract incorrect values or fail to find required information.

Retry with Feedback

def extract_with_feedback(text: str, schema, max_attempts=3):
    conversation = [
        {"role": "system", "content": "Extract structured data"},
        {"role": "user", "content": text}
    ]
    
    for attempt in range(max_attempts):
        response = openai.chat.completions.create(
            model="gpt-4",
            messages=conversation,
            tools=[schema],
            tool_choice={"type": "function", "function": {"name": schema["function"]["name"]}}
        )
        
        try:
            tool_call = response.choices[0].message.tool_calls[0]
            arguments = json.loads(tool_call.function.arguments)
            
            # Validate with Pydantic
            validated = YourPydanticModel(**arguments)
            return validated
            
        except (json.JSONDecodeError, ValidationError) as e:
            # Add error feedback to conversation
            conversation.append({
                "role": "assistant",
                "content": f"Error: {str(e)}"
            })
            conversation.append({
                "role": "user",
                "content": "Please fix the error and try again."
            })
    
    raise Exception("Failed after all retries")

Performance Considerations

Latency

Function calling and constrained generation add minimal latency (typically under 50ms).

JSON mode has negligible overhead compared to standard text generation.

Token Usage

Function schemas consume input tokens. Keep schemas concise to minimize costs.

Instead of verbose descriptions, use short, clear names and descriptions.

Caching

For repeated schemas, use prompt caching (if supported by your provider) to reduce costs.


Security Considerations

Schema Injection

Never allow user input to directly modify function schemas. An attacker could inject malicious schema definitions.

Validation is Critical

Always validate extracted data before using it in databases or APIs.

LLMs can hallucinate values that are syntactically correct but semantically wrong (e.g., extracting a fake email address).


Best Practices for Production

  • Use function calling for critical workflows that need reliability.
  • Always validate outputs with Pydantic or similar libraries.
  • Implement retry logic with exponential backoff.
  • Log all structured outputs for debugging and auditing.
  • Monitor extraction success rates and alert on anomalies.
  • Test your schemas against edge cases (empty inputs, malformed text).
  • Version your schemas and track changes over time.

Real-World Use Cases

1. Invoice Processing

Extract line items, totals, and customer info from scanned invoices. Function calling ensures all required fields are present.

2. API Request Generation

Convert natural language commands into API calls. Grammar constraints ensure valid API syntax.

3. Database Queries

Generate SQL queries from user questions. Constrained generation prevents SQL injection.

4. Content Classification

Classify content into predefined categories with confidence scores. Function calling returns structured predictions.

5. Form Filling

Extract information from emails or documents to auto-fill web forms. Schema validation ensures all required fields are captured.


Future Directions

Structured output capabilities are rapidly improving:

  • Native schema support: Models trained with schema awareness during pre-training.
  • Multi-modal extraction: Extracting structured data from images and documents.
  • Learned constraints: Models that learn custom constraints from examples.
  • Streaming structured outputs: Receiving partial validated structures during generation.

Conclusion

Structured outputs transform LLMs from text generators into reliable data extraction and processing tools.

For production applications, the combination of function calling, Pydantic validation, and retry logic provides the reliability you need.

JSON mode is sufficient for simple use cases, but function calling is the gold standard for critical workflows.

As LLMs continue to improve, structured output capabilities will become more robust, but the principles of schema design, validation, and error handling remain essential.


Key Takeaways

  • LLMs are probabilistic and prone to syntax errors without constraints.
  • JSON mode guarantees valid JSON syntax but not schema compliance.
  • Function calling enforces schema-level constraints for production reliability.
  • Always validate outputs with Pydantic or similar libraries.
  • Implement retry logic with error feedback for robustness.
  • Grammar-based generation provides maximum control for custom formats.
  • Monitor extraction success rates and log all outputs for debugging.
  • Never trust LLM outputs without validation, even with function calling.

Related Articles