Structured Output
Why Structured Output Matters
When you build with LLMs, you quickly run into a recurring issue: LLMs are great at text, but most applications don’t consume text — they consume structured data.
- Your frontend wants a typed object.
- Your database wants a row with strict columns.
- Your agent wants to call a tool with well-defined parameters.
That’s the structured output problem: forcing free-form language into a schema your code can trust.
The Naïve Approach: “Please Return JSON”
The first attempt almost everyone makes is to ask politely:
from openai import OpenAI
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Return ONLY a JSON object with title, director, year."},
{"role": "user", "content": "The Godfather"}
]
)
print(resp.choices[0].message.content)
And this works great:
{"title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972}
Until it doesn’t:
- Extra prose:
Here's the JSON you requested:
- Extra fields:
cast, genre, etc.
- Wrong types:
"year": 1972
instead of an integer. Maybe evennineteen seventy two
. - Wrapped in code fences.
If you've ever piped this into json.loads
, you've seen how brittle it is.
Function Calling: Schema-Aware JSON
To fix this, OpenAI introduced function calling. Instead of begging the model, you define a JSON Schema up front.
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"director": {"type": "string"},
"year": {"type": "integer"}
},
"required": ["title","director","year"]
}
output = client.chat.completions.create(
model=os.getenv("OPENAI_MODEL","gpt-4o-mini"),
messages=[{"role":"system","content":"Extract a movie object."},
{"role":"user","content":"The Godfather"}],
functions=[{"name":"return_movie","description":"Movie object","parameters":schema}],
function_call={"name":"return_movie"},
)
print(output.choices[0].message.function_call.arguments)
Output:
{"title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972}
No prose, no fences, matches the schema. This feels a lot safer.
Where it might fail
Up until now, we used a relatively simple schema. This approach works great for title, director, year
. But as soon as you add enums, nested objects, regexes, or scale to 100+ calls, cracks appear.
I tried to use the following schema and ask for infromation about Titanic:
schema = {
"type": "object",
"properties": {
"title": {"type": "string", "minLength": 1},
"director": {"type": "string", "minLength": 3},
"year": {"type": "integer", "minimum": 1888, "maximum": 2100},
"rating": {"type": "string", "enum": ["G", "PG", "PG-13", "R", "NC-17"]},
"release": {
"type": "object",
"properties": {
"country_code": {
"type": "string",
"pattern": "^[A-Z]{2}$" # exactly two-letter ISO code
},
"date": {
"type": "string",
"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$" # strict YYYY-MM-DD
}
},
"required": ["country_code", "date"],
"additionalProperties": False
},
"characters": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"actor": {"type": "string"}
},
"required": ["name", "actor"],
"additionalProperties": False
},
"minItems": 5, # must list at least 5 characters
"maxItems": 10
}
},
"required": ["title", "director", "year", "rating", "release", "characters"],
"additionalProperties": False
}
output = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role":"system","content":"Extract a movie object."},
{"role":"user","content":"Return a movie JSON for 'Titanic' with rating and a couple of main characters"}],
functions=[{"name":"return_movie","description":"Movie object","parameters":schema}],
function_call={"name":"return_movie"},
)
response = json.loads(output.choices[0].message.function_call.arguments)
Draft202012Validator(schema).validate(response)
Looks fine at a glance, but you’ll often see:
jsonschema.exceptions.ValidationError: [{'name': 'Jack Dawson', 'actor': 'Leonardo DiCaprio'}, {'name': 'Rose DeWitt Bukater', 'actor': 'Kate Winslet'}] is too short
If I'd have to enumerate some of the common cracks once you move beyond the toy:
- Ambiguous asks:
movies by Christopher Nolan
returns an array; your schema expects one object. - Strict enums/regex: rating must be exactly
PG-13
; you may getPG 13
/Rated PG-13
. - Nested/array schemas: missing subfields, extra properties, or too few/many items (e.g., require 5 characters, get 2).
- Batch/scale effects: out of 100 prompts, a few stragglers still violate the schema; you need retries + validation.
The point isn’t that function calling is bad. It’s that production needs guardrails: validation, retries, and sane defaults when the model drifts.
Takeaway
- Don’t trust "just JSON".
- Function calling is good until schemas get real.
- Production workloads = validate every response and retry on failure.
- If you want fewer papercuts, use a framework (BAML, Instructor, PydanticAI) that bakes in schemas, retries, and typed clients.
What's next
In the next post, I'll showcase how to use BAML to enforce a typed schema and how agentic flows have to have structured outputs to be useful.
👉 Full code, schemas, and validator harness are in my GitHub repo