Instructor: Getting Structured Output from LLMs with Python

One of the biggest challenges when working with Large Language Models (LLMs) is getting structured and consistent output. LLMs by default produce free-form text, which is difficult to parse and integrate into applications. The Instructor library solves this problem by leveraging Pydantic for validation and structured data extraction from LLMs.

In this tutorial, we will learn how to use Instructor to get reliable JSON/Pydantic outputs from various LLM providers such as OpenAI, Anthropic, and others.

What Is Instructor?

Instructor is a Python library that patches LLM clients (like OpenAI) to return validated Pydantic objects instead of plain strings. Instructor works by leveraging function calling or JSON mode from the LLM, then validates the results using Pydantic.

Key advantages of Instructor:

Type-safe: Output is guaranteed to match the defined Pydantic schema
Automatic retry: If validation fails, Instructor automatically retries with error feedback
Streaming support: Supports partial streaming for complex objects
Multi-provider: Supports OpenAI, Anthropic, Google, Mistral, and more
Custom validation: You can add Pydantic validators for business logic

Installation

First, install Instructor along with the required dependencies:

pip install instructor openai pydantic

For other providers, install additional dependencies:

# For Anthropic pip install instructor anthropic For Google Gemini pip install instructor google-generativeai For Mistral pip install instructor mistralai

Make sure you have an API key from the provider you will be using:

export OPENAIAPIKEY="sk-your-api-key-here"

Basic Usage with OpenAI

Let's start with a simple example: extracting user information from text.

import instructor
from openai import OpenAI
from pydantic import BaseModel

Patch OpenAI client with Instructor
client = instructor.fromopenai(OpenAI())


Define the output schema
class UserInfo(BaseModel):
    name: str
    age: int
    email: str

Extract structured data from text
user = client.chat.completions.create(
    model="gpt-4o-mini",
    responsemodel=UserInfo,
    messages=[
        {
            "role": "user",
            "content": "My name is John Smith, I'm 28 years old. "
                       "My email is john.smith@email.com"
        }
    ],
)

print(user)
UserInfo(name='John Smith', age=28, email='john.smith@email.com')
print(user.name)   # John Smith
print(user.age)    # 28
print(user.email)  # john.smith@email.com

Notice that responsemodel=UserInfo is the key parameter that tells Instructor what schema to expect. The result is not a dictionary or string, but a validated Pydantic object.

Complex Pydantic Models

Instructor supports complex Pydantic models including nested models, optional fields, enums, and lists.

from pydantic import BaseModel, Field
from typing import Optional, List
from enum import Enum

class JobLevel(str, Enum):
    JUNIOR = "junior"
    MID = "mid"
    SENIOR = "senior"
    LEAD = "lead"

class Skill(BaseModel):
    name: str = Field(description="Name of the skill or technology")
    yearsexperience: int = Field(
        description="Years of experience", ge=0, le=50
    )
    proficiency: str = Field(
        description="Proficiency level: beginner, intermediate, advanced"
    )

class WorkExperience(BaseModel):
    company: str
    role: str
    durationmonths: int = Field(ge=1)

    description: str

class CandidateProfile(BaseModel):
    name: str
    currentrole: str
    level: JobLevel
    totalyearsexperience: int = Field(ge=0)
    skills: List[Skill]
    workhistory: List[WorkExperience]

    education: str
    summary: str = Field(
        description="Profile summary of the candidate in 2-3 sentences"
    )

resumetext = """
I'm Sarah Chen, currently working as a Senior Data Engineer at Spotify

Instructor: Getting Structured Output from LLMs with Python

Instructor: Getting Structured Output from LLMs with Python

What Is Instructor?

Installation

For Google Gemini

For Mistral

Basic Usage with OpenAI

Patch OpenAI client with Instructor

Define the output schema

Extract structured data from text

UserInfo(name='John Smith', age=28, email='john.smith@email.com')

Complex Pydantic Models

Related Articles

PydanticAI Tutorial: A Type-Safe Agent Framework for LLM Apps

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Axolotl Tutorial: Configuration-Driven LLM Fine-Tuning

Unsloth Tutorial: Fast and Memory-Efficient LLM Fine-Tuning

Related Articles

PydanticAI Tutorial: A Type-Safe Agent Framework for LLM Apps

Membangun Agen LLM yang Type-Safe dengan PydanticAI PydanticAI adalah framework agen dari tim di balik Pydantic, diranca...

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...

Axolotl Tutorial: Configuration-Driven LLM Fine-Tuning

Fine-Tuning LLM Berbasis Konfigurasi dengan Axolotl Kebanyakan proyek fine-tuning dimulai dengan cara yang sama: seseora...

Unsloth Tutorial: Fast and Memory-Efficient LLM Fine-Tuning

Fine-Tuning LLM Secara Efisien dengan Unsloth Dahulu, melakukan fine-tuning model bahasa besar membutuhkan server multi-...