Outlines: Structured LLM Generation with Constrained Decoding

One of the biggest challenges when working with Large Language Models (LLMs) is consistently getting structured and valid output. When you need valid JSON, an enum from specific choices, or other specific formats, LLMs often produce output that does not match the required format. Outlines solves this problem with a constrained decoding approach at the token level, guaranteeing that output is always structurally valid.

How Constrained Decoding Works

Before diving into Outlines, it is important to understand the difference between two approaches to structured generation:

Retry-Based Approach (like Instructor)

Libraries like Instructor use a "generate-then-validate" approach:

The LLM generates free-form output

The output is parsed and validated

If validation fails, the prompt is modified and the LLM is called again

Repeat until output is valid or the retry limit is reached

The problem: API costs increase, latency is unpredictable, and there is no guarantee of convergence.

Constrained Decoding Approach (Outlines)

Outlines uses a fundamentally different approach:

Before generation, Outlines builds a finite-state machine (FSM) or regular expression automaton from the desired schema

At each step of token generation, Outlines calculates which tokens are valid based on the current state

The probabilities of invalid tokens are set to zero (masked)

The LLM can only choose from valid tokens

The result: output is guaranteed to be 100% valid without retries, with minimal computational overhead.

Installation

Installing Outlines is straightforward:

pip install outlines

To use with local transformers models:

pip install outlines[transformers]

For integration with llama.cpp:

pip install outlines[llamacpp]

For integration with vLLM:

pip install outlines[vllm]

JSON Generation with Pydantic Models

The most popular feature of Outlines is the ability to generate valid JSON based on Pydantic models.

Basic Example

import outlines
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum

Define schema with Pydantic
class Address(BaseModel):
    street: str = Field(description="Street name and number")
    city: str = Field(description="City name")
    state: str = Field(description="State or province")
    zipcode: str = Field(description="Postal code")


class Employee(BaseModel):
    name: str = Field(description="Full name of the employee")
    age: int = Field(ge=18, le=65, description="Employee age")
    email: str = Field(description="Email address")
    department: str = Field(description="Work department")
    salary: float = Field(gt=0, description="Monthly salary")
    address: Address
    skills: List[str] = Field(description="List of skills")

Load model
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Create generator with JSON schema
generator = outlines.generate.json(model, Employee)

Generate structured data
prompt = """Create a fictional employee record for a technology company
in San Francisco. This employee is a senior data engineer."""

result = generator(prompt)
print(type(result))  # 
print(result.name)
print(result.department)
print(result.modeldumpjson(indent=2))

Nested and Complex Schemas

from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum
import outlines

class Priority(str, Enum):
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"

class Status(str, Enum):
    TODO = "todo"
    INPROGRESS = "inprogress"

    REVIEW = "review"
    DONE = "done"

class SubTask(BaseModel):
    title: str
    completed: bool

class Task(BaseModel):
    id: int
    title: str = Field(maxlength=100)
    description: str
    priority: Priority
    status: Status
    assignee: str
    estimatedhours: float = Field(gt=0)

    subtasks: List[SubTask]
    tags: List[str]

Outlines: Structured LLM Generation with Constrained Decoding

Outlines: Structured LLM Generation with Constrained Decoding

How Constrained Decoding Works

Retry-Based Approach (like Instructor)

Constrained Decoding Approach (Outlines)

Installation

JSON Generation with Pydantic Models

Basic Example

Define schema with Pydantic

Load model

Create generator with JSON schema

Generate structured data

Nested and Complex Schemas

Related Articles

SGLang Tutorial: Fast LLM Serving and Structured Generation

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Axolotl Tutorial: Configuration-Driven LLM Fine-Tuning

PydanticAI Tutorial: A Type-Safe Agent Framework for LLM Apps

Related Articles

SGLang Tutorial: Fast LLM Serving and Structured Generation

SGLang: Serving LLM yang Cepat dan Model Pemrograman untuk Generasi Terstruktur SGLang adalah dua hal dalam satu paket: ...

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...

Axolotl Tutorial: Configuration-Driven LLM Fine-Tuning

Fine-Tuning LLM Berbasis Konfigurasi dengan Axolotl Kebanyakan proyek fine-tuning dimulai dengan cara yang sama: seseora...

PydanticAI Tutorial: A Type-Safe Agent Framework for LLM Apps

Membangun Agen LLM yang Type-Safe dengan PydanticAI PydanticAI adalah framework agen dari tim di balik Pydantic, diranca...