Structured Outputs in Practice: Instructor vs PydanticAI vs BAML
Preface
In part one, I wrote about why structured outputs matter and why just asking an LLM to “return JSON” doesn’t cut it. Instead of yappping around abstractly, we’re going hands-on: I put three frameworks — Instructor, PydanticAI, and BAML — through the same task and looked at how they handle schemas, retries, and reliability.
The Task
We want to turn messy natural language into a structured Task object:
class Task(BaseModel):
    description: str
    priority: Literal["low","medium","high","urgent"]
    owner: Optional[str]
    tags: List[str]
    deadline: Optional[date]
    confidence: float
Input (Slack-style DM):
hey can you fix the login? users get 500 after oauth callback. pretty urgent. 
tag: auth, bug. maybe @sara can take it. need it done by Friday.
Expected output: a typed Task with
-description="fix the login after oauth callback error"
-priority="urgent"
-tags=["auth", "bug"]
-owner="alex"
-deadline=<Friday's date>
-confidence=0.9
Instructor
What it is?
- Created by Jason Liu (jxnl).
 - Wraps provider-specific client to return Pydantic models directly.
 - Under the hood, uses provider-specific function calling + Pydantic validation.
 
import instructor
from openai import OpenAI
from schema import Task
client = instructor.from_openai(OpenAI())
task = client.chat.completions.create(
    response_model=Task,
    messages=[{"role":"user","content":"extract task from..."}],
)
✅ Pros
- Easiest path to “typed results”.
 - Minimal code changes if you’re already using Pydantic.
 - Active community examples.
 
⚠️ Cons
- Validation failures cause model retries, increasing latency.
 
PydanticAI
What it is?
- Official library from the Pydantic team.
 - Not just structured outputs — designed as a typed agent framework.
 - Validates output into your schema (result_type) and can grow into multi-step workflows.
 
from pydantic_ai import Agent
from schema import Task
agent = Agent(model="openai:gpt-4o-mini", result_type=Task)
result = agent.run("extract task from ...")
task = result.data
✅ Pros
- First-class Pydantic integration.
 - Async-friendly.
 - Extensible into agent patterns (tools, memory).
 - More comprehensive retry mechanism
 
⚠️ Cons
- Early-stage; docs and ecosystem smaller than Instructor.
 - Validation failures cause model retries, increasing latency.
 
BAML
What it is?
- Built by BoundaryML.
 - You declare schemas and prompts in a DSL (.baml files).
 - Compiles into a generated Python client.
 - Uses Schema-Aligned Parsing (SAP) for robust decoding (handles malformed JSON gracefully).
 
DSL Example
type Task {
  description: string
  priority: "low" | "medium" | "high" | "urgent"
  owner?: string
  tags: string[]
  deadline?: date
  confidence: number
}
python code
from baml_client import b
task = b.ExtractTask(text="draft the demo slides; urgent; assign to @alex; tag: presentation")
✅ Pros
- Separation of schema/prompt from code.
 - SAP parsing = state-of-the-art results on the Berkeley Function Calling benchmark.
 - It also makes it faster because three's no retry logic.
 - comprehansive prompt templates and types.
 
⚠️ Cons
- Extra build step and DSL to learn.
 
Takeaways
- Instructor = best drop-in safety net.
 - PydanticAI = most agent-ready.
 - BAML = strictest and most robust, with SAP parsing delivering state-of-the-art results.
 
What's next
In the next post, I'll showcase how to use one or two of these libraries to enforce a typed schema and how agentic flows have to have structured outputs to be useful.
👉 Full code, schemas, and validator harness are in my GitHub repo