RAG Advanced - Building Production-Grade Retrieval-Augmented Generation

Introduction

Prerequisites

Installation and Setup

Hybrid Search: Dense + Sparse Retrieval

Reranking with Cross-Encoders

Query Transformation Techniques

Parent-Child Chunking Strategy

Recursive Retrieval

Evaluation with RAGAS

Production RAG Pipeline

Best Practices

Conclusion

Introduction

Retrieval-Augmented Generation (RAG) has become the standard approach for grounding Large Language Models (LLMs) in domain-specific knowledge. While basic RAG systems are straightforward to build, production-grade RAG requires sophisticated techniques to handle the nuances of real-world information retrieval.

Basic RAG limitations that advanced techniques address:

Semantic gap: Dense embeddings miss lexical matches; sparse retrieval misses semantic similarity
Retrieval noise: Top-k results often include irrelevant documents
Query ambiguity: User queries may be vague, multi-faceted, or poorly formed
Context fragmentation: Fixed-size chunks lose document structure and context
Quality measurement: No systematic way to evaluate RAG pipeline performance

In this tutorial, you will learn advanced RAG techniques that significantly improve retrieval quality, answer accuracy, and overall system reliability. Each technique is production-tested and can be combined for maximum effectiveness.

Prerequisites

Python 3.10 or higher
Basic understanding of RAG architecture (embeddings, vector stores, LLMs)
Familiarity with LangChain or similar frameworks
An OpenAI API key (or other LLM provider)
8GB+ RAM recommended for local embedding models

Installation and Setup

# Core dependencies pip install langchain langchain-openai langchain-community pip install chromadb faiss-cpu pip install sentence-transformers pip install rank-bm25 For evaluation pip install ragas For cross-encoder reranking pip install transformers torch Additional utilities pip install tiktoken numpy pandas

Setup environment:

import os
os.environ["OPENAIAPIKEY"] = "your-api-key"

from langchainopenai import ChatOpenAI, OpenAIEmbeddings

from langchain.textsplitter import RecursiveCharacterTextSplitter
from langchaincommunity.vectorstores import Chroma, FAISS


Verify setup
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
print("Setup complete.")

Hybrid Search: Dense + Sparse Retrieval

Hybrid search combines the strengths of dense vector retrieval (semantic understanding) with sparse retrieval (exact keyword matching) for more robust document retrieval.

import numpy as np
from rankbm25 import BM25Okapi
from langchainopenai import OpenAIEmbeddings

from langchaincommunity.vectorstores import FAISS
from langchain.schema import Document
from typing import List, Tuple


class HybridSearchRetriever:
    """
    Combines dense (embedding) and sparse (BM25) retrieval
    with configurable fusion weights.
    """

    def init(
        self,
        documents: List[Document],
        embeddings,
        denseweight: float = 0.6,

        sparseweight: float = 0.4,
    ):
        self.documents = documents
        self.denseweight = denseweight
        self.sparseweight = sparseweight

        # Build dense index
        self.vectorstore = FAISS.fromdocuments(documents, embeddings)

        # Build sparse index (BM25)
        tokenizeddocs = [doc.pagecontent.lower().split() for doc in documents]
        self.bm25 = BM25Okapi(tokenizeddocs)

Advanced RAG Tutorial: Hybrid Search, Reranking, and Evaluation

RAG Advanced - Building Production-Grade Retrieval-Augmented Generation

Table of Contents

Introduction

Prerequisites

Installation and Setup

For evaluation

For cross-encoder reranking

Additional utilities

Verify setup

Hybrid Search: Dense + Sparse Retrieval

Related Articles

RAGAS: Evaluation Framework for RAG Pipelines

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

ColBERT & RAGatouille Tutorial: Late-Interaction Retrieval for RAG

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Related Articles

RAGAS: Evaluation Framework for RAG Pipelines

RAGAS: Framework Evaluasi untuk Pipeline RAG Pendahuluan Retrieval-Augmented Generation (RAG) telah menjadi arsitektur s...

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

Tutorial Lengkap LlamaIndex: Membangun Aplikasi RAG dengan LLM LlamaIndex adalah framework data yang powerful untuk memb...

ColBERT & RAGatouille Tutorial: Late-Interaction Retrieval for RAG

ColBERT & RAGatouille: Retrieval Late-Interaction untuk RAG yang Lebih Baik Sebagian besar sistem RAG mengandalkan dense...

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...