Structured Outputs in Practice: Instructor vs PydanticAI vs BAML

Preface

In part one, I wrote about why structured outputs matter and why just asking an LLM to “return JSON” doesn’t cut it. Instead of yappping around abstractly, we’re going hands-on: I put three frameworks — Instructor, PydanticAI, and BAML — through the same task and looked at how they handle schemas, retries, and reliability.

The Task

We want to turn messy natural language into a structured Task object:

class Task(BaseModel):
    description: str
    priority: Literal["low","medium","high","urgent"]
    owner: Optional[str]
    tags: List[str]
    deadline: Optional[date]
    confidence: float

Input (Slack-style DM):

hey can you fix the login? users get 500 after oauth callback. pretty urgent. 
tag: auth, bug. maybe @sara can take it. need it done by Friday.

Expected output: a typed Task with
-description="fix the login after oauth callback error"
-priority="urgent"
-tags=["auth", "bug"]
-owner="alex"
-deadline=<Friday's date>
-confidence=0.9

Instructor

What it is?

  • Created by Jason Liu (jxnl).
  • Wraps provider-specific client to return Pydantic models directly.
  • Under the hood, uses provider-specific function calling + Pydantic validation.
import instructor
from openai import OpenAI
from schema import Task

client = instructor.from_openai(OpenAI())

task = client.chat.completions.create(
    response_model=Task,
    messages=[{"role":"user","content":"extract task from..."}],
)

✅ Pros

  • Easiest path to “typed results”.
  • Minimal code changes if you’re already using Pydantic.
  • Active community examples.

⚠️ Cons

  • Validation failures cause model retries, increasing latency.

PydanticAI

What it is?

  • Official library from the Pydantic team.
  • Not just structured outputs — designed as a typed agent framework.
  • Validates output into your schema (result_type) and can grow into multi-step workflows.
from pydantic_ai import Agent
from schema import Task

agent = Agent(model="openai:gpt-4o-mini", result_type=Task)
result = agent.run("extract task from ...")
task = result.data

✅ Pros

  • First-class Pydantic integration.
  • Async-friendly.
  • Extensible into agent patterns (tools, memory).
  • More comprehensive retry mechanism

⚠️ Cons

  • Early-stage; docs and ecosystem smaller than Instructor.
  • Validation failures cause model retries, increasing latency.

BAML

What it is?

  • Built by BoundaryML.
  • You declare schemas and prompts in a DSL (.baml files).
  • Compiles into a generated Python client.
  • Uses Schema-Aligned Parsing (SAP) for robust decoding (handles malformed JSON gracefully).

DSL Example

type Task {
  description: string
  priority: "low" | "medium" | "high" | "urgent"
  owner?: string
  tags: string[]
  deadline?: date
  confidence: number
}

python code

from baml_client import b
task = b.ExtractTask(text="draft the demo slides; urgent; assign to @alex; tag: presentation")

✅ Pros

  • Separation of schema/prompt from code.
  • SAP parsing = state-of-the-art results on the Berkeley Function Calling benchmark.
  • It also makes it faster because three's no retry logic.
  • comprehansive prompt templates and types.

⚠️ Cons

  • Extra build step and DSL to learn.

Takeaways

  • Instructor = best drop-in safety net.
  • PydanticAI = most agent-ready.
  • BAML = strictest and most robust, with SAP parsing delivering state-of-the-art results.

What's next

In the next post, I'll showcase how to use one or two of these libraries to enforce a typed schema and how agentic flows have to have structured outputs to be useful.

👉 Full code, schemas, and validator harness are in my GitHub repo