Structured Output
Why Structured Output Matters
When you build with LLMs, you quickly run into a recurring issue: LLMs are great at text, but most applications don’t consume text — they consume structured data.
- Your frontend wants a typed object.
 - Your database wants a row with strict columns.
 - Your agent wants to call a tool with well-defined parameters.
 
That’s the structured output problem: forcing free-form language into a schema your code can trust.
The Naïve Approach: “Please Return JSON”
The first attempt almost everyone makes is to ask politely:
from openai import OpenAI
client = OpenAI()
resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Return ONLY a JSON object with title, director, year."},
        {"role": "user", "content": "The Godfather"}
    ]
)
print(resp.choices[0].message.content)
And this works great:
{"title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972}
Until it doesn’t:
- Extra prose: 
Here's the JSON you requested: - Extra fields: 
cast, genre, etc. - Wrong types: 
"year": 1972instead of an integer. Maybe evennineteen seventy two. - Wrapped in code fences.
 
If you've ever piped this into json.loads, you've seen how brittle it is.
Function Calling: Schema-Aware JSON
To fix this, OpenAI introduced function calling. Instead of begging the model, you define a JSON Schema up front.
schema = {
  "type": "object",
  "properties": {
    "title": {"type": "string"},
    "director": {"type": "string"},
    "year": {"type": "integer"}
  },
  "required": ["title","director","year"]
}
output = client.chat.completions.create(
        model=os.getenv("OPENAI_MODEL","gpt-4o-mini"),
        messages=[{"role":"system","content":"Extract a movie object."},
                  {"role":"user","content":"The Godfather"}],
        functions=[{"name":"return_movie","description":"Movie object","parameters":schema}],
        function_call={"name":"return_movie"},
    )
print(output.choices[0].message.function_call.arguments)
Output:
{"title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972}
No prose, no fences, matches the schema. This feels a lot safer.
Where it might fail
Up until now, we used a relatively simple schema. This approach works great for title, director, year. But as soon as you add enums, nested objects, regexes, or scale to 100+ calls, cracks appear.
I tried to use the following schema and ask for infromation about Titanic:
schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string", "minLength": 1},
        "director": {"type": "string", "minLength": 3},
        "year": {"type": "integer", "minimum": 1888, "maximum": 2100},
        "rating": {"type": "string", "enum": ["G", "PG", "PG-13", "R", "NC-17"]},
        "release": {
            "type": "object",
            "properties": {
                "country_code": {
                    "type": "string",
                    "pattern": "^[A-Z]{2}$"  # exactly two-letter ISO code
                },
                "date": {
                    "type": "string",
                    "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"  # strict YYYY-MM-DD
                }
            },
            "required": ["country_code", "date"],
            "additionalProperties": False
        },
        "characters": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "actor": {"type": "string"}
                },
                "required": ["name", "actor"],
                "additionalProperties": False
            },
            "minItems": 5,  # must list at least 5 characters
            "maxItems": 10
        }
    },
    "required": ["title", "director", "year", "rating", "release", "characters"],
    "additionalProperties": False
}
output = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role":"system","content":"Extract a movie object."},
                  {"role":"user","content":"Return a movie JSON for 'Titanic' with rating and a couple of main characters"}],
        functions=[{"name":"return_movie","description":"Movie object","parameters":schema}],
        function_call={"name":"return_movie"},
    )
response = json.loads(output.choices[0].message.function_call.arguments)
Draft202012Validator(schema).validate(response)
Looks fine at a glance, but you’ll often see:
jsonschema.exceptions.ValidationError: [{'name': 'Jack Dawson', 'actor': 'Leonardo DiCaprio'}, {'name': 'Rose DeWitt Bukater', 'actor': 'Kate Winslet'}] is too short
If I'd have to enumerate some of the common cracks once you move beyond the toy:
- Ambiguous asks: 
movies by Christopher Nolanreturns an array; your schema expects one object. - Strict enums/regex: rating must be exactly 
PG-13; you may getPG 13/Rated PG-13. - Nested/array schemas: missing subfields, extra properties, or too few/many items (e.g., require 5 characters, get 2).
 - Batch/scale effects: out of 100 prompts, a few stragglers still violate the schema; you need retries + validation.
 
The point isn’t that function calling is bad. It’s that production needs guardrails: validation, retries, and sane defaults when the model drifts.
Takeaway
- Don’t trust "just JSON".
 - Function calling is good until schemas get real.
 - Production workloads = validate every response and retry on failure.
 - If you want fewer papercuts, use a framework (BAML, Instructor, PydanticAI) that bakes in schemas, retries, and typed clients.
 
What's next
In the next post, I'll showcase how to use BAML to enforce a typed schema and how agentic flows have to have structured outputs to be useful.
👉 Full code, schemas, and validator harness are in my GitHub repo