Structured Output

September 4, 2025

Why Structured Output Matters

When you build with LLMs, you quickly run into a recurring issue: LLMs are great at text, but most applications don’t consume text — they consume structured data.

Your frontend wants a typed object.
Your database wants a row with strict columns.
Your agent wants to call a tool with well-defined parameters.

That’s the structured output problem: forcing free-form language into a schema your code can trust.

The Naïve Approach: “Please Return JSON”

The first attempt almost everyone makes is to ask politely:

from openai import OpenAI
client = OpenAI()

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Return ONLY a JSON object with title, director, year."},
        {"role": "user", "content": "The Godfather"}
    ]
)

print(resp.choices[0].message.content)

And this works great:

{"title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972}

Until it doesn’t:

Extra prose: Here's the JSON you requested:
Extra fields: cast, genre, etc.
Wrong types: "year": 1972 instead of an integer. Maybe even nineteen seventy two.
Wrapped in code fences.

If you've ever piped this into json.loads, you've seen how brittle it is.

Function Calling: Schema-Aware JSON

To fix this, OpenAI introduced function calling. Instead of begging the model, you define a JSON Schema up front.


schema = {
  "type": "object",
  "properties": {
    "title": {"type": "string"},
    "director": {"type": "string"},
    "year": {"type": "integer"}
  },
  "required": ["title","director","year"]
}

output = client.chat.completions.create(
        model=os.getenv("OPENAI_MODEL","gpt-4o-mini"),
        messages=[{"role":"system","content":"Extract a movie object."},
                  {"role":"user","content":"The Godfather"}],
        functions=[{"name":"return_movie","description":"Movie object","parameters":schema}],
        function_call={"name":"return_movie"},
    )
print(output.choices[0].message.function_call.arguments)

Output:

{"title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972}

No prose, no fences, matches the schema. This feels a lot safer.

Where it might fail

Up until now, we used a relatively simple schema. This approach works great for title, director, year. But as soon as you add enums, nested objects, regexes, or scale to 100+ calls, cracks appear.

I tried to use the following schema and ask for infromation about Titanic:

schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string", "minLength": 1},
        "director": {"type": "string", "minLength": 3},
        "year": {"type": "integer", "minimum": 1888, "maximum": 2100},
        "rating": {"type": "string", "enum": ["G", "PG", "PG-13", "R", "NC-17"]},
        "release": {
            "type": "object",
            "properties": {
                "country_code": {
                    "type": "string",
                    "pattern": "^[A-Z]{2}$"  # exactly two-letter ISO code
                },
                "date": {
                    "type": "string",
                    "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"  # strict YYYY-MM-DD
                }
            },
            "required": ["country_code", "date"],
            "additionalProperties": False
        },
        "characters": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "actor": {"type": "string"}
                },
                "required": ["name", "actor"],
                "additionalProperties": False
            },
            "minItems": 5,  # must list at least 5 characters
            "maxItems": 10
        }
    },
    "required": ["title", "director", "year", "rating", "release", "characters"],
    "additionalProperties": False
}

output = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role":"system","content":"Extract a movie object."},
                  {"role":"user","content":"Return a movie JSON for 'Titanic' with rating and a couple of main characters"}],
        functions=[{"name":"return_movie","description":"Movie object","parameters":schema}],
        function_call={"name":"return_movie"},
    )

response = json.loads(output.choices[0].message.function_call.arguments)
Draft202012Validator(schema).validate(response)

Looks fine at a glance, but you’ll often see:

jsonschema.exceptions.ValidationError: [{'name': 'Jack Dawson', 'actor': 'Leonardo DiCaprio'}, {'name': 'Rose DeWitt Bukater', 'actor': 'Kate Winslet'}] is too short

If I'd have to enumerate some of the common cracks once you move beyond the toy:

Ambiguous asks: movies by Christopher Nolan returns an array; your schema expects one object.
Strict enums/regex: rating must be exactly PG-13; you may get PG 13 / Rated PG-13.
Nested/array schemas: missing subfields, extra properties, or too few/many items (e.g., require 5 characters, get 2).
Batch/scale effects: out of 100 prompts, a few stragglers still violate the schema; you need retries + validation.

The point isn’t that function calling is bad. It’s that production needs guardrails: validation, retries, and sane defaults when the model drifts.

Takeaway

Don’t trust "just JSON".
Function calling is good until schemas get real.
Production workloads = validate every response and retry on failure.
If you want fewer papercuts, use a framework (BAML, Instructor, PydanticAI) that bakes in schemas, retries, and typed clients.

What's next

In the next post, I'll showcase how to use BAML to enforce a typed schema and how agentic flows have to have structured outputs to be useful.

👉 Full code, schemas, and validator harness are in my GitHub repo