Instructor: Getting Structured Output from LLMs with Python

# Instructor: Mendapatkan Structured Output dari LLM dengan Python Salah satu tantangan terbesar saat bekerja dengan Large Language Models (LLM) adalah mendapatkan output yang terstruktur dan konsist...

By Ruby Abdullah · · tutorial
InstructorLLMPydanticStructured OutputPython

Instructor: Getting Structured Output from LLMs with Python

One of the biggest challenges when working with Large Language Models (LLMs) is getting structured and consistent output. LLMs by default produce free-form text, which is difficult to parse and integrate into applications. The Instructor library solves this problem by leveraging Pydantic for validation and structured data extraction from LLMs.

In this tutorial, we will learn how to use Instructor to get reliable JSON/Pydantic outputs from various LLM providers such as OpenAI, Anthropic, and others.

What Is Instructor?

Instructor is a Python library that patches LLM clients (like OpenAI) to return validated Pydantic objects instead of plain strings. Instructor works by leveraging function calling or JSON mode from the LLM, then validates the results using Pydantic.

Key advantages of Instructor:

  • Type-safe: Output is guaranteed to match the defined Pydantic schema
  • Automatic retry: If validation fails, Instructor automatically retries with error feedback
  • Streaming support: Supports partial streaming for complex objects
  • Multi-provider: Supports OpenAI, Anthropic, Google, Mistral, and more
  • Custom validation: You can add Pydantic validators for business logic

Installation

First, install Instructor along with the required dependencies:

pip install instructor openai pydantic

For other providers, install additional dependencies:

# For Anthropic

pip install instructor anthropic

For Google Gemini

pip install instructor google-generativeai

For Mistral

pip install instructor mistralai

Make sure you have an API key from the provider you will be using:

export OPENAIAPIKEY="sk-your-api-key-here"

Basic Usage with OpenAI

Let's start with a simple example: extracting user information from text.

import instructor

from openai import OpenAI

from pydantic import BaseModel

Patch OpenAI client with Instructor

client = instructor.fromopenai(OpenAI())

Define the output schema

class UserInfo(BaseModel):

name: str

age: int

email: str

Extract structured data from text

user = client.chat.completions.create(

model="gpt-4o-mini",

responsemodel=UserInfo,

messages=[

{

"role": "user",

"content": "My name is John Smith, I'm 28 years old. "

"My email is john.smith@email.com"

}

],

)

print(user)

UserInfo(name='John Smith', age=28, email='john.smith@email.com')

print(user.name) # John Smith

print(user.age) # 28

print(user.email) # john.smith@email.com

Notice that responsemodel=UserInfo is the key parameter that tells Instructor what schema to expect. The result is not a dictionary or string, but a validated Pydantic object.

Complex Pydantic Models

Instructor supports complex Pydantic models including nested models, optional fields, enums, and lists.

from pydantic import BaseModel, Field

from typing import Optional, List

from enum import Enum

class JobLevel(str, Enum):

JUNIOR = "junior"

MID = "mid"

SENIOR = "senior"

LEAD = "lead"

class Skill(BaseModel):

name: str = Field(description="Name of the skill or technology")

yearsexperience: int = Field(

description="Years of experience", ge=0, le=50

)

proficiency: str = Field(

description="Proficiency level: beginner, intermediate, advanced"

)

class WorkExperience(BaseModel):

company: str

role: str

durationmonths: int = Field(ge=1)

description: str

class CandidateProfile(BaseModel):

name: str

currentrole: str

level: JobLevel

totalyearsexperience: int = Field(ge=0)

skills: List[Skill]

workhistory: List[WorkExperience]

education: str

summary: str = Field(

description="Profile summary of the candidate in 2-3 sentences"

)

resumetext = """

I'm Sarah Chen, currently working as a Senior Data Engineer at Spotify

Related Articles

PydanticAI Tutorial: A Type-Safe Agent Framework for LLM Apps

Membangun Agen LLM yang Type-Safe dengan PydanticAI PydanticAI adalah framework agen dari tim di balik Pydantic, diranca...

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...

Axolotl Tutorial: Configuration-Driven LLM Fine-Tuning

Fine-Tuning LLM Berbasis Konfigurasi dengan Axolotl Kebanyakan proyek fine-tuning dimulai dengan cara yang sama: seseora...

Unsloth Tutorial: Fast and Memory-Efficient LLM Fine-Tuning

Fine-Tuning LLM Secara Efisien dengan Unsloth Dahulu, melakukan fine-tuning model bahasa besar membutuhkan server multi-...