Outlines: Structured LLM Generation with Constrained Decoding

# Outlines: Structured Generation dari LLM dengan Constrained Decoding Salah satu tantangan terbesar saat bekerja dengan Large Language Models (LLM) adalah mendapatkan output yang terstruktur dan val...

By Ruby Abdullah · · tutorial
OutlinesLLMStructured GenerationConstrained DecodingPython

Outlines: Structured LLM Generation with Constrained Decoding

One of the biggest challenges when working with Large Language Models (LLMs) is consistently getting structured and valid output. When you need valid JSON, an enum from specific choices, or other specific formats, LLMs often produce output that does not match the required format. Outlines solves this problem with a constrained decoding approach at the token level, guaranteeing that output is always structurally valid.

How Constrained Decoding Works

Before diving into Outlines, it is important to understand the difference between two approaches to structured generation:

Retry-Based Approach (like Instructor)

Libraries like Instructor use a "generate-then-validate" approach:

  • The LLM generates free-form output
  • The output is parsed and validated
  • If validation fails, the prompt is modified and the LLM is called again
  • Repeat until output is valid or the retry limit is reached
  • The problem: API costs increase, latency is unpredictable, and there is no guarantee of convergence.

    Constrained Decoding Approach (Outlines)

    Outlines uses a fundamentally different approach:

  • Before generation, Outlines builds a finite-state machine (FSM) or regular expression automaton from the desired schema
  • At each step of token generation, Outlines calculates which tokens are valid based on the current state
  • The probabilities of invalid tokens are set to zero (masked)
  • The LLM can only choose from valid tokens
  • The result: output is guaranteed to be 100% valid without retries, with minimal computational overhead.

    Installation

    Installing Outlines is straightforward:

    pip install outlines
    

    To use with local transformers models:

    pip install outlines[transformers]
    

    For integration with llama.cpp:

    pip install outlines[llamacpp]
    

    For integration with vLLM:

    pip install outlines[vllm]
    

    JSON Generation with Pydantic Models

    The most popular feature of Outlines is the ability to generate valid JSON based on Pydantic models.

    Basic Example

    import outlines
    

    from pydantic import BaseModel, Field

    from typing import List, Optional

    from enum import Enum

    Define schema with Pydantic

    class Address(BaseModel):

    street: str = Field(description="Street name and number")

    city: str = Field(description="City name")

    state: str = Field(description="State or province")

    zipcode: str = Field(description="Postal code")

    class Employee(BaseModel):

    name: str = Field(description="Full name of the employee")

    age: int = Field(ge=18, le=65, description="Employee age")

    email: str = Field(description="Email address")

    department: str = Field(description="Work department")

    salary: float = Field(gt=0, description="Monthly salary")

    address: Address

    skills: List[str] = Field(description="List of skills")

    Load model

    model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

    Create generator with JSON schema

    generator = outlines.generate.json(model, Employee)

    Generate structured data

    prompt = """Create a fictional employee record for a technology company

    in San Francisco. This employee is a senior data engineer."""

    result = generator(prompt)

    print(type(result)) #

    print(result.name)

    print(result.department)

    print(result.modeldumpjson(indent=2))

    Nested and Complex Schemas

    from pydantic import BaseModel, Field
    

    from typing import List, Optional

    from enum import Enum

    import outlines

    class Priority(str, Enum):

    HIGH = "high"

    MEDIUM = "medium"

    LOW = "low"

    class Status(str, Enum):

    TODO = "todo"

    INPROGRESS = "inprogress"

    REVIEW = "review"

    DONE = "done"

    class SubTask(BaseModel):

    title: str

    completed: bool

    class Task(BaseModel):

    id: int

    title: str = Field(maxlength=100)

    description: str

    priority: Priority

    status: Status

    assignee: str

    estimatedhours: float = Field(gt=0)

    subtasks: List[SubTask]

    tags: List[str]

    Related Articles

    SGLang Tutorial: Fast LLM Serving and Structured Generation

    SGLang: Serving LLM yang Cepat dan Model Pemrograman untuk Generasi Terstruktur SGLang adalah dua hal dalam satu paket: ...

    TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

    Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...

    Axolotl Tutorial: Configuration-Driven LLM Fine-Tuning

    Fine-Tuning LLM Berbasis Konfigurasi dengan Axolotl Kebanyakan proyek fine-tuning dimulai dengan cara yang sama: seseora...

    PydanticAI Tutorial: A Type-Safe Agent Framework for LLM Apps

    Membangun Agen LLM yang Type-Safe dengan PydanticAI PydanticAI adalah framework agen dari tim di balik Pydantic, diranca...