GraphRAG Tutorial: Graph-Based Retrieval Augmented Generation

# Tutorial GraphRAG: Retrieval Augmented Generation Berbasis Graph Knowledge ## Pendahuluan Retrieval Augmented Generation (RAG) telah menjadi pendekatan standar untuk menghubungkan Large Language M...

By Ruby Abdullah · · tutorial
GraphRAGRAGKnowledge GraphNLPLLM

GraphRAG Tutorial: Graph-Based Retrieval Augmented Generation

Introduction

Retrieval Augmented Generation (RAG) has become the standard approach for connecting Large Language Models (LLMs) with external data. However, traditional RAG has significant limitations: it only performs vector similarity search without understanding the relationships between entities in documents. This is where GraphRAG comes in.

GraphRAG, developed by Microsoft Research, combines the power of knowledge graphs with RAG to produce more comprehensive and contextual answers. Instead of simply searching for text chunks similar to a query, GraphRAG builds a knowledge graph from documents, identifies entities and their relationships, then uses the graph structure to answer questions requiring holistic understanding.

In this tutorial, we will learn how to use Microsoft's graphrag library to build a graph-based RAG system from scratch to production-ready deployment.

Why GraphRAG?

Traditional RAG works well for questions whose answers are contained in one or a few text chunks. However, traditional RAG struggles with questions requiring synthesis of information from many sources, such as:

  • "What are the main themes discussed across all these documents?"
  • "How does department A relate to department B in the organization?"
  • "List all projects involving technology X and person Y"

GraphRAG overcomes these limitations with two query approaches:

  • Local Search: Finds relevant entities in the graph along with their community context to answer specific questions
  • Global Search: Uses community summaries from the entire graph to answer questions requiring comprehensive understanding
  • Installation

    Prerequisites

    • Python 3.10 or later
    • API key from OpenAI or Azure OpenAI (for LLM and embeddings)
    • Minimum 8GB RAM (indexing process requires significant memory)

    Installing the Library

    pip install graphrag
    

    For the latest version from the repository:

    pip install git+https://github.com/microsoft/graphrag.git
    

    Verify the installation:

    import graphrag
    

    print(graphrag.version)

    Setting Up API Keys

    GraphRAG requires an LLM for entity extraction and summary generation. Set up your OpenAI API key:

    export GRAPHRAGAPIKEY="sk-your-openai-api-key"
    

    Or for Azure OpenAI:

    export GRAPHRAGAPIKEY="your-azure-api-key"
    

    export GRAPHRAGAPIBASE="https://your-resource.openai.azure.com"

    export GRAPHRAGAPIVERSION="2024-06-01"

    Project Initialization

    Creating Project Structure

    mkdir graphrag-demo
    

    cd graphrag-demo

    python -m graphrag init --root .

    This command generates the directory structure:

    graphrag-demo/
    

    ├── settings.yaml # Main configuration

    ├── .env # Environment variables

    └── input/ # Folder for source documents

    Preparing Input Documents

    Place the text documents you want to index in the input/ folder. GraphRAG supports .txt and .csv files.

    mkdir -p input
    

    As an example, we will create sample documents about a fictional technology company:

    # createsampledata.py
    

    import os

    documents = {

    "companyoverview.txt": """

    TechNova Inc. is a technology company founded in 2020 in San Francisco.

    The company focuses on developing artificial intelligence solutions for

    the banking and healthcare sectors. The CEO is James Chen, who previously

    worked at Google for 10 years.

    The company has three main divisions: the AI Research Division led by

    Dr. Sarah Williams, the Product Development Division led by Michael Park,

    and the Business Development Division led by Elena Rodriguez.

    TechNova Inc. secured Series B funding of $50 million from Sequoia Capital

    and Andreessen Horowitz in 2023. The company currently has 200 employees

    spread across San Francisco, New York, and Austin.

    """,

    "projects.txt": """

    TechNova Inc.'s main projects include:

    Related Articles

    RAGAS: Evaluation Framework for RAG Pipelines

    RAGAS: Framework Evaluasi untuk Pipeline RAG Pendahuluan Retrieval-Augmented Generation (RAG) telah menjadi arsitektur s...

    Haystack Tutorial: NLP Framework for Production

    Haystack - Framework NLP untuk Produksi Daftar Isi Pendahuluan Prasyarat Memahami Arsitektur Haystack [Document Store].....

    Advanced RAG Tutorial: Hybrid Search, Reranking, and Evaluation

    RAG Tingkat Lanjut - Membangun Retrieval-Augmented Generation Kelas Produksi Daftar Isi Pendahuluan Prasyarat Instalasi ...

    Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

    Tutorial Lengkap LlamaIndex: Membangun Aplikasi RAG dengan LLM LlamaIndex adalah framework data yang powerful untuk memb...