Guardrails AI: LLM Output Validation and Filtering

# Guardrails AI: Validasi dan Filter Output LLM ## Pendahuluan Large Language Models (LLM) sangat powerful dalam menghasilkan teks, tetapi output-nya tidak selalu dapat diandalkan. LLM bisa menghasi...

By Ruby Abdullah · · tutorial
GuardrailsLLMValidationSafetyPython

Guardrails AI: LLM Output Validation and Filtering

Introduction

Large Language Models (LLMs) are incredibly powerful at generating text, but their output is not always reliable. LLMs can produce inaccurate information, leak sensitive data such as PII (Personally Identifiable Information), or fail to conform to expected formats. In production applications, especially customer-facing ones, this can be a serious problem.

Guardrails AI provides a solution for validating, filtering, and structuring LLM outputs. This framework acts as a safety layer between the LLM and end users, ensuring that every response meets predefined quality and safety criteria.

In this tutorial, we will learn how to use Guardrails AI from installation, Guard objects, built-in validators, Pydantic integration, to building a safe customer-facing chatbot with PII filtering and factual grounding.

Installation

Install Guardrails AI and required dependencies:

pip install guardrails-ai

After installation, configure the Guardrails CLI and install validators from the Guardrails Hub:

guardrails configure

Install validators from Hub

guardrails hub install hub://guardrails/regexmatch

guardrails hub install hub://guardrails/detectpii

guardrails hub install hub://guardrails/toxicity

guardrails hub install hub://guardrails/provenancellm

Make sure your LLM API key is configured:

export OPENAIAPIKEY="sk-your-api-key-here"

Guard Object

The Guard is the primary object in Guardrails AI. It acts as a wrapper around LLM calls, adding validation to inputs and/or outputs.

from guardrails import Guard

from guardrails.hub import RegexMatch

import openai

Create a simple Guard

guard = Guard().use(

RegexMatch(regex=r"^\d{4}-\d{2}-\d{2}$",

onfail="exception")

)

Use Guard to call LLM

result = guard(

model="gpt-4o-mini",

messages=[{

"role": "user",

"content": "Give me today's date in YYYY-MM-DD format only, no other text."

}]

)

print(f"Validated output: {result.validatedoutput}")

print(f"Validation status: {result.validationpassed}")

Guard supports several actions when validation fails:

# onfail options:

"exception" - Raise an exception

"filter" - Remove output that fails validation

"fix" - Attempt to fix the output

"reask" - Ask the LLM to regenerate

"noop" - Continue without action (log only)

guardwithreask = Guard().use(

RegexMatch(

regex=r"^[A-Z][a-z]+$",

onfail="reask" # Ask LLM to try again if it fails

)

)

Validators

Guardrails AI provides various built-in validators through the Guardrails Hub. Here are some of the most commonly used ones:

Regex Validation

from guardrails import Guard

from guardrails.hub import RegexMatch

Validate email format

emailguard = Guard().use(

RegexMatch(

regex=r"^[a-zA-Z0-9.%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$",

onfail="reask"

)

)

result = emailguard(

model="gpt-4o-mini",

messages=[{

"role": "user",

"content": "Provide a valid business email address example."

}]

)

print(f"Email: {result.validatedoutput}")

PII Detection

from guardrails import Guard

from guardrails.hub import DetectPII

Guard to detect and remove PII

piiguard = Guard().use(

DetectPII(

piientities=[

"EMAILADDRESS",

"PHONENUMBER",

"PERSON",

"CREDITCARD",

"IPADDRESS",

"ID"

],

onfail="fix" # Automatically mask detected PII

)

)

result = piiguard(

model="gpt-4o-mini",

messages=[{

"role": "user",

"content": "Tell me about our customer John Doe who can be reached at john@email.com or 555-123-4567."

}]

)

print(f"Safe output: {result.validatedoutput}")

PII will be masked: "[PERSON] who can be reached at [EMAIL] or [PHONE]"

Related Articles

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...

Axolotl Tutorial: Configuration-Driven LLM Fine-Tuning

Fine-Tuning LLM Berbasis Konfigurasi dengan Axolotl Kebanyakan proyek fine-tuning dimulai dengan cara yang sama: seseora...

PydanticAI Tutorial: A Type-Safe Agent Framework for LLM Apps

Membangun Agen LLM yang Type-Safe dengan PydanticAI PydanticAI adalah framework agen dari tim di balik Pydantic, diranca...

Unsloth Tutorial: Fast and Memory-Efficient LLM Fine-Tuning

Fine-Tuning LLM Secara Efisien dengan Unsloth Dahulu, melakukan fine-tuning model bahasa besar membutuhkan server multi-...