Structured Outputs in LLMs: JSON Mode, Function Calling, and Schema Validation
Introduction
Language models are designed to generate natural language text. They excel at conversations, creative writing, and explanations. However, production applications rarely need just free-form text. They need structured data.
Consider these common use cases:
- Extracting customer information from support tickets into a database.
- Parsing invoices and returning line items as JSON.
- Generating API requests from natural language commands.
- Classifying content into predefined categories.
- Building multi-step agents that call functions and tools.
All of these require the LLM to produce output in a specific, machine-readable format. Free-form text is not good enough. You need valid JSON, properly formatted function calls, or data that matches a strict schema.
This is harder than it sounds. LLMs are probabilistic text generators, not compilers. They can produce malformed JSON, miss required fields, or hallucinate extra keys.
This post explains how to reliably extract structured outputs from LLMs using JSON mode, function calling, grammar constraints, and schema validation techniques.
Why Structured Outputs Are Hard
When you ask a language model to produce JSON, it generates text token by token. It does not have an internal representation of JSON structure. It is predicting the next token based on probability distributions.
This leads to several problems:
- Syntax errors: Missing commas, unmatched brackets, trailing commas where not allowed.
- Type mismatches: Returning a string when the schema expects an integer.
- Missing required fields: The model omits keys that your application depends on.
- Extra fields: The model adds fields not in your schema, breaking strict parsers.
- Wrapper text: The model includes explanatory text before or after the JSON block.
A simple prompt like "Return the answer as JSON" often works, but it is not reliable at scale. In production, even a 2% failure rate can cause thousands of errors daily.
Approach 1: Prompt Engineering for Structured Output
The simplest method is careful prompt design. You explicitly instruct the model to return only valid JSON with no additional text.
Example Prompt
Extract the following information from the text and return it as JSON.
Do not include any explanatory text. Only return valid JSON.
Schema:
{
"name": string,
"email": string,
"phone": string,
"issue": string
}
Text: "Hi, I'm Sarah Chen. My email is sarah.chen@example.com and my phone is 555-0123. I can't log into my account."
JSON:
Strengths
- Works with any LLM, including older models.
- No special API features required.
- Simple to implement.
Weaknesses
- Still prone to errors with complex schemas.
- No guarantee of valid JSON syntax.
- Requires manual schema description in the prompt.
- Wrapper text issues (model adds "Here is the JSON:" before output).
This approach works for simple cases but is not production-grade for critical workflows.
Approach 2: JSON Mode
Modern LLM APIs now offer "JSON mode" as a first-class feature. When enabled, the model is constrained to produce only valid JSON.
How JSON Mode Works
JSON mode works by constraining the model's token generation. At each step, the model can only select tokens that keep the output valid JSON.
This is implemented at the inference level, not just through prompting. The model's logits (probability scores) are filtered to exclude tokens that would break JSON syntax.
For example, after generating {"name": "Alice", the model cannot generate a token that would
create invalid syntax like {"name": "Alice" "email".
OpenAI JSON Mode Example
import openai
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": "You are a helpful assistant that outputs valid JSON."
},
{
"role": "user",
"content": "Extract name, email, and phone from: 'Contact John Doe at john@example.com or 555-1234'"
}
],
response_format={"type": "json_object"}
)
print(response.choices[0].message.content)
Output:
{
"name": "John Doe",
"email": "john@example.com",
"phone": "555-1234"
}
Anthropic JSON Mode Example
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Extract person info as JSON: 'Jane Smith, jane@test.com, 555-9999'"
}
],
# Note: Anthropic uses different method - prompt engineering with tool use
)
Strengths of JSON Mode
- Guarantees syntactically valid JSON.
- No wrapper text or extra explanations.
- Reduces parsing errors significantly.
Limitations of JSON Mode
- Does not enforce schema compliance (field names, types, required fields).
- You still need to validate the structure after receiving it.
- The model can still hallucinate field values or miss required keys.
JSON mode solves syntax issues but not semantic correctness.
Approach 3: Function Calling / Tool Use
Function calling (also called tool use) is the most robust way to get structured outputs that match a specific schema.
Instead of asking the model to produce JSON, you define a function schema upfront. The model is then constrained to produce output that matches that schema exactly.
How Function Calling Works
You define a function signature with typed parameters. The LLM API enforces that the model's output conforms to this schema.
Under the hood, function calling uses constrained decoding similar to JSON mode, but with schema-level guarantees.
OpenAI Function Calling Example
import openai
# Define the function schema
tools = [
{
"type": "function",
"function": {
"name": "extract_contact_info",
"description": "Extract contact information from text",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Full name of the person"
},
"email": {
"type": "string",
"description": "Email address"
},
"phone": {
"type": "string",
"description": "Phone number"
}
},
"required": ["name", "email"]
}
}
}
]
# Call the API
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": "Extract info: 'Bob Johnson, bob@company.com, 555-7890'"
}
],
tools=tools,
tool_choice={"type": "function", "function": {"name": "extract_contact_info"}}
)
# Extract the function call
tool_call = response.choices[0].message.tool_calls[0]
arguments = json.loads(tool_call.function.arguments)
print(arguments)
Output:
{
"name": "Bob Johnson",
"email": "bob@company.com",
"phone": "555-7890"
}
Anthropic Tool Use Example
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
# Define the tool schema
tools = [
{
"name": "extract_contact",
"description": "Extract contact information",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"phone": {"type": "string"}
},
"required": ["name", "email"]
}
}
]
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=[
{
"role": "user",
"content": "Extract: 'Alice Brown, alice@example.com, 555-1111'"
}
]
)
# Extract tool use
tool_use = next(block for block in message.content if block.type == "tool_use")
print(tool_use.input)
Strengths of Function Calling
- Enforces schema compliance (required fields, types).
- Guarantees syntactically correct JSON.
- No post-processing or validation needed.
- Works for multi-step tool use (agents).
When to Use Function Calling
- Production systems where reliability is critical.
- Applications that need to call APIs or databases.
- Agentic workflows with multiple tool options.
- Any scenario where schema validation is required.
Approach 4: Pydantic for Schema Validation
Even with JSON mode or function calling, you should validate the output against your expected schema. Pydantic is the standard Python library for this.
Defining a Pydantic Model
from pydantic import BaseModel, EmailStr, validator
class ContactInfo(BaseModel):
name: str
email: EmailStr
phone: str
@validator('phone')
def validate_phone(cls, v):
# Simple phone validation
if not v or len(v) < 10:
raise ValueError('Phone number must be at least 10 digits')
return v
# Parse and validate LLM output
llm_output = '{"name": "Alice", "email": "alice@example.com", "phone": "555-1234567"}'
contact = ContactInfo.parse_raw(llm_output)
print(contact)
If the LLM produces invalid data, Pydantic raises a ValidationError with details about what
failed.
Integrating Pydantic with LLM Calls
import openai
from pydantic import BaseModel, EmailStr
import json
class ContactInfo(BaseModel):
name: str
email: EmailStr
phone: str
def extract_contact(text: str) -> ContactInfo:
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": f"Extract contact info as JSON matching this schema: {ContactInfo.schema_json()}"
},
{"role": "user", "content": text}
],
response_format={"type": "json_object"}
)
json_output = response.choices[0].message.content
return ContactInfo.parse_raw(json_output)
# Usage
result = extract_contact("Contact: Sarah Lee, sarah.lee@test.com, 555-9999")
print(result.name) # "Sarah Lee"
print(result.email) # "sarah.lee@test.com"
Automatic Retry on Validation Failure
from pydantic import ValidationError
def extract_with_retry(text: str, max_retries=3) -> ContactInfo:
for attempt in range(max_retries):
try:
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Extract as JSON"},
{"role": "user", "content": text}
],
response_format={"type": "json_object"}
)
json_output = response.choices[0].message.content
return ContactInfo.parse_raw(json_output)
except ValidationError as e:
if attempt == max_retries - 1:
raise
# Retry with error feedback
text = f"{text}\n\nPrevious attempt failed: {str(e)}. Please fix and try again."
raise Exception("Failed after all retries")
Approach 5: Grammar-Based Constrained Generation
For maximum control, you can use grammar-based constrained generation. This forces the model to follow a specific grammar (like JSON, SQL, or custom formats).
How It Works
Instead of filtering tokens after generation, the grammar is enforced during generation. At each step, only tokens that satisfy the grammar are allowed.
This is more powerful than JSON mode because you can define custom grammars beyond JSON.
llama.cpp Grammar Example
The llama.cpp library supports grammar-based generation using GBNF (GGML BNF) format.
# Define a JSON grammar for contact info
root ::= object
object ::= "{" ws members ws "}"
members ::= member ("," ws member)*
member ::= "\"name\"" ws ":" ws string | "\"email\"" ws ":" ws string | "\"phone\"" ws ":" ws string
string ::= "\"" ([^"\\] | "\\" .)* "\""
ws ::= [ \t\n]*
This grammar ensures the model can only generate JSON with the exact fields you specify.
Outlines Library (Python)
The Outlines library provides grammar-based generation for Hugging Face models.
from outlines import models, generate
# Load model
model = models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
# Define JSON schema
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"phone": {"type": "string"}
},
"required": ["name", "email"]
}
# Generate with schema constraint
generator = generate.json(model, schema)
result = generator("Extract contact info: 'Tom White, tom@company.com, 555-4444'")
print(result)
When to Use Grammar Constraints
- Self-hosted open-source models where you control inference.
- Custom output formats beyond JSON (SQL, code, DSLs).
- Maximum reliability requirements (0% syntax errors).
Comparison: Structured Output Methods
| Method | Syntax Guarantee | Schema Enforcement | Ease of Use | Best For |
|---|---|---|---|---|
| Prompt Engineering | No | No | Easy | Simple cases, prototyping |
| JSON Mode | Yes | No | Easy | General JSON output |
| Function Calling | Yes | Yes | Medium | Production apps, agents |
| Pydantic Validation | Post-check | Yes | Medium | Validation layer |
| Grammar Constraints | Yes | Yes | Hard | Custom formats, self-hosted |
Handling Nested and Complex Schemas
Real-world applications often need nested objects and arrays.
Example: Order Extraction
from pydantic import BaseModel
from typing import List
class LineItem(BaseModel):
product: str
quantity: int
price: float
class Order(BaseModel):
customer_name: str
items: List[LineItem]
total: float
# Function calling schema
tools = [
{
"type": "function",
"function": {
"name": "extract_order",
"parameters": {
"type": "object",
"properties": {
"customer_name": {"type": "string"},
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"product": {"type": "string"},
"quantity": {"type": "integer"},
"price": {"type": "number"}
},
"required": ["product", "quantity", "price"]
}
},
"total": {"type": "number"}
},
"required": ["customer_name", "items", "total"]
}
}
}
]
The LLM will extract the order as a properly nested structure.
Error Handling and Retry Strategies
Even with function calling, errors can occur. The model might extract incorrect values or fail to find required information.
Retry with Feedback
def extract_with_feedback(text: str, schema, max_attempts=3):
conversation = [
{"role": "system", "content": "Extract structured data"},
{"role": "user", "content": text}
]
for attempt in range(max_attempts):
response = openai.chat.completions.create(
model="gpt-4",
messages=conversation,
tools=[schema],
tool_choice={"type": "function", "function": {"name": schema["function"]["name"]}}
)
try:
tool_call = response.choices[0].message.tool_calls[0]
arguments = json.loads(tool_call.function.arguments)
# Validate with Pydantic
validated = YourPydanticModel(**arguments)
return validated
except (json.JSONDecodeError, ValidationError) as e:
# Add error feedback to conversation
conversation.append({
"role": "assistant",
"content": f"Error: {str(e)}"
})
conversation.append({
"role": "user",
"content": "Please fix the error and try again."
})
raise Exception("Failed after all retries")
Performance Considerations
Latency
Function calling and constrained generation add minimal latency (typically under 50ms).
JSON mode has negligible overhead compared to standard text generation.
Token Usage
Function schemas consume input tokens. Keep schemas concise to minimize costs.
Instead of verbose descriptions, use short, clear names and descriptions.
Caching
For repeated schemas, use prompt caching (if supported by your provider) to reduce costs.
Security Considerations
Schema Injection
Never allow user input to directly modify function schemas. An attacker could inject malicious schema definitions.
Validation is Critical
Always validate extracted data before using it in databases or APIs.
LLMs can hallucinate values that are syntactically correct but semantically wrong (e.g., extracting a fake email address).
Best Practices for Production
- Use function calling for critical workflows that need reliability.
- Always validate outputs with Pydantic or similar libraries.
- Implement retry logic with exponential backoff.
- Log all structured outputs for debugging and auditing.
- Monitor extraction success rates and alert on anomalies.
- Test your schemas against edge cases (empty inputs, malformed text).
- Version your schemas and track changes over time.
Real-World Use Cases
1. Invoice Processing
Extract line items, totals, and customer info from scanned invoices. Function calling ensures all required fields are present.
2. API Request Generation
Convert natural language commands into API calls. Grammar constraints ensure valid API syntax.
3. Database Queries
Generate SQL queries from user questions. Constrained generation prevents SQL injection.
4. Content Classification
Classify content into predefined categories with confidence scores. Function calling returns structured predictions.
5. Form Filling
Extract information from emails or documents to auto-fill web forms. Schema validation ensures all required fields are captured.
Future Directions
Structured output capabilities are rapidly improving:
- Native schema support: Models trained with schema awareness during pre-training.
- Multi-modal extraction: Extracting structured data from images and documents.
- Learned constraints: Models that learn custom constraints from examples.
- Streaming structured outputs: Receiving partial validated structures during generation.
Conclusion
Structured outputs transform LLMs from text generators into reliable data extraction and processing tools.
For production applications, the combination of function calling, Pydantic validation, and retry logic provides the reliability you need.
JSON mode is sufficient for simple use cases, but function calling is the gold standard for critical workflows.
As LLMs continue to improve, structured output capabilities will become more robust, but the principles of schema design, validation, and error handling remain essential.
Key Takeaways
- LLMs are probabilistic and prone to syntax errors without constraints.
- JSON mode guarantees valid JSON syntax but not schema compliance.
- Function calling enforces schema-level constraints for production reliability.
- Always validate outputs with Pydantic or similar libraries.
- Implement retry logic with error feedback for robustness.
- Grammar-based generation provides maximum control for custom formats.
- Monitor extraction success rates and log all outputs for debugging.
- Never trust LLM outputs without validation, even with function calling.