Advanced RAG Tutorial: Hybrid Search, Reranking, and Evaluation

# RAG Tingkat Lanjut - Membangun Retrieval-Augmented Generation Kelas Produksi ## Daftar Isi 1. [Pendahuluan](#pendahuluan) 2. [Prasyarat](#prasyarat) 3. [Instalasi dan Pengaturan](#instalasi-dan-pe...

By Ruby Abdullah · · tutorial
RAGHybrid SearchRerankingRAGASLLMVector Search

RAG Advanced - Building Production-Grade Retrieval-Augmented Generation

Table of Contents

  • Introduction
  • Prerequisites
  • Installation and Setup
  • Hybrid Search: Dense + Sparse Retrieval
  • Reranking with Cross-Encoders
  • Query Transformation Techniques
  • Parent-Child Chunking Strategy
  • Recursive Retrieval
  • Evaluation with RAGAS
  • Production RAG Pipeline
  • Best Practices
  • Conclusion

  • Introduction

    Retrieval-Augmented Generation (RAG) has become the standard approach for grounding Large Language Models (LLMs) in domain-specific knowledge. While basic RAG systems are straightforward to build, production-grade RAG requires sophisticated techniques to handle the nuances of real-world information retrieval.

    Basic RAG limitations that advanced techniques address:

    • Semantic gap: Dense embeddings miss lexical matches; sparse retrieval misses semantic similarity
    • Retrieval noise: Top-k results often include irrelevant documents
    • Query ambiguity: User queries may be vague, multi-faceted, or poorly formed
    • Context fragmentation: Fixed-size chunks lose document structure and context
    • Quality measurement: No systematic way to evaluate RAG pipeline performance

    In this tutorial, you will learn advanced RAG techniques that significantly improve retrieval quality, answer accuracy, and overall system reliability. Each technique is production-tested and can be combined for maximum effectiveness.


    Prerequisites

    • Python 3.10 or higher
    • Basic understanding of RAG architecture (embeddings, vector stores, LLMs)
    • Familiarity with LangChain or similar frameworks
    • An OpenAI API key (or other LLM provider)
    • 8GB+ RAM recommended for local embedding models


    Installation and Setup

    # Core dependencies
    

    pip install langchain langchain-openai langchain-community

    pip install chromadb faiss-cpu

    pip install sentence-transformers

    pip install rank-bm25

    For evaluation

    pip install ragas

    For cross-encoder reranking

    pip install transformers torch

    Additional utilities

    pip install tiktoken numpy pandas

    Setup environment:

    import os
    

    os.environ["OPENAIAPIKEY"] = "your-api-key"

    from langchainopenai import ChatOpenAI, OpenAIEmbeddings

    from langchain.textsplitter import RecursiveCharacterTextSplitter

    from langchaincommunity.vectorstores import Chroma, FAISS

    Verify setup

    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

    print("Setup complete.")


    Hybrid Search: Dense + Sparse Retrieval

    Hybrid search combines the strengths of dense vector retrieval (semantic understanding) with sparse retrieval (exact keyword matching) for more robust document retrieval.

    import numpy as np
    

    from rankbm25 import BM25Okapi

    from langchainopenai import OpenAIEmbeddings

    from langchaincommunity.vectorstores import FAISS

    from langchain.schema import Document

    from typing import List, Tuple

    class HybridSearchRetriever:

    """

    Combines dense (embedding) and sparse (BM25) retrieval

    with configurable fusion weights.

    """

    def init(

    self,

    documents: List[Document],

    embeddings,

    denseweight: float = 0.6,

    sparseweight: float = 0.4,

    ):

    self.documents = documents

    self.denseweight = denseweight

    self.sparseweight = sparseweight

    # Build dense index

    self.vectorstore = FAISS.fromdocuments(documents, embeddings)

    # Build sparse index (BM25)

    tokenizeddocs = [doc.pagecontent.lower().split() for doc in documents]

    self.bm25 = BM25Okapi(tokenizeddocs)

    Related Articles

    RAGAS: Evaluation Framework for RAG Pipelines

    RAGAS: Framework Evaluasi untuk Pipeline RAG Pendahuluan Retrieval-Augmented Generation (RAG) telah menjadi arsitektur s...

    Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

    Tutorial Lengkap LlamaIndex: Membangun Aplikasi RAG dengan LLM LlamaIndex adalah framework data yang powerful untuk memb...

    ColBERT & RAGatouille Tutorial: Late-Interaction Retrieval for RAG

    ColBERT & RAGatouille: Retrieval Late-Interaction untuk RAG yang Lebih Baik Sebagian besar sistem RAG mengandalkan dense...

    TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

    Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...