Structured Outputs in Practice: Instructor vs PydanticAI vs BAML
Preface
In part one, I wrote about why structured outputs matter and why just asking an LLM to “return JSON” doesn’t cut it. Instead of yappping around abstractly, we’re going hands-on: I put three frameworks — Instructor, PydanticAI, and BAML — through the same task and looked at how they handle schemas, retries, and reliability.
The Task
We want to turn messy natural language into a structured Task
object:
class Task(BaseModel):
description: str
priority: Literal["low","medium","high","urgent"]
owner: Optional[str]
tags: List[str]
deadline: Optional[date]
confidence: float
Input (Slack-style DM):
hey can you fix the login? users get 500 after oauth callback. pretty urgent.
tag: auth, bug. maybe @sara can take it. need it done by Friday.
Expected output: a typed Task with
-description="fix the login after oauth callback error"
-priority="urgent"
-tags=["auth", "bug"]
-owner="alex"
-deadline=<Friday's date>
-confidence=0.9
Instructor
What it is?
- Created by Jason Liu (jxnl).
- Wraps provider-specific client to return Pydantic models directly.
- Under the hood, uses provider-specific function calling + Pydantic validation.
import instructor
from openai import OpenAI
from schema import Task
client = instructor.from_openai(OpenAI())
task = client.chat.completions.create(
response_model=Task,
messages=[{"role":"user","content":"extract task from..."}],
)
✅ Pros
- Easiest path to “typed results”.
- Minimal code changes if you’re already using Pydantic.
- Active community examples.
⚠️ Cons
- Validation failures cause model retries, increasing latency.
PydanticAI
What it is?
- Official library from the Pydantic team.
- Not just structured outputs — designed as a typed agent framework.
- Validates output into your schema (result_type) and can grow into multi-step workflows.
from pydantic_ai import Agent
from schema import Task
agent = Agent(model="openai:gpt-4o-mini", result_type=Task)
result = agent.run("extract task from ...")
task = result.data
✅ Pros
- First-class Pydantic integration.
- Async-friendly.
- Extensible into agent patterns (tools, memory).
- More comprehensive retry mechanism
⚠️ Cons
- Early-stage; docs and ecosystem smaller than Instructor.
- Validation failures cause model retries, increasing latency.
BAML
What it is?
- Built by BoundaryML.
- You declare schemas and prompts in a DSL (.baml files).
- Compiles into a generated Python client.
- Uses Schema-Aligned Parsing (SAP) for robust decoding (handles malformed JSON gracefully).
DSL Example
type Task {
description: string
priority: "low" | "medium" | "high" | "urgent"
owner?: string
tags: string[]
deadline?: date
confidence: number
}
python code
from baml_client import b
task = b.ExtractTask(text="draft the demo slides; urgent; assign to @alex; tag: presentation")
✅ Pros
- Separation of schema/prompt from code.
- SAP parsing = state-of-the-art results on the Berkeley Function Calling benchmark.
- It also makes it faster because three's no retry logic.
- comprehansive prompt templates and types.
⚠️ Cons
- Extra build step and DSL to learn.
Takeaways
- Instructor = best drop-in safety net.
- PydanticAI = most agent-ready.
- BAML = strictest and most robust, with SAP parsing delivering state-of-the-art results.
What's next
In the next post, I'll showcase how to use one or two of these libraries to enforce a typed schema and how agentic flows have to have structured outputs to be useful.
👉 Full code, schemas, and validator harness are in my GitHub repo