Weaviate: Vector Database with Integrated AI Modules

# Weaviate: Database Vektor dengan AI Modules Terintegrasi Weaviate adalah database vektor open-source yang dirancang untuk menyimpan objek data beserta vektor embeddingnya. Yang membuat Weaviate uni...

By Ruby Abdullah · · tutorial
WeaviateVector DatabaseSemantic SearchGraphQLPython

Weaviate: Vector Database with Integrated AI Modules

Weaviate is an open-source vector database designed to store data objects along with their vector embeddings. What makes Weaviate unique is its ability to integrate AI modules directly into the database, enabling auto-vectorization, semantic search, and even generative AI without additional infrastructure.

In this tutorial, we will learn how to use Weaviate from installation, schema definition, vector and hybrid search, to building a semantic product search engine with auto-vectorization and generative answers.

Why Weaviate?

Weaviate offers several advantages over other vector databases:

  • Integrated AI Modules: Vectorizer and generative modules built directly into the database
  • Auto-Vectorization: Data is automatically vectorized on insertion without manual preprocessing
  • Hybrid Search: Combines vector (semantic) and keyword (BM25) search
  • GraphQL API: Flexible and powerful query interface
  • Multi-Tenancy: Data isolation for multi-tenant applications
  • Scalability: Supports horizontal scaling for large datasets
  • Integration Ecosystem: Compatible with LangChain, LlamaIndex, and other AI frameworks

Installation

Create a docker-compose.yml file:

version: '3.4'

services:

weaviate:

image: cr.weaviate.io/semitechnologies/weaviate:1.25.0

restart: on-failure:0

ports:

  • "8080:8080"
  • "50051:50051"
environment:

QUERYDEFAULTSLIMIT: 25

AUTHENTICATIONANONYMOUSACCESSENABLED: 'true'

PERSISTENCEDATAPATH: '/var/lib/weaviate'

DEFAULTVECTORIZERMODULE: 'text2vec-openai'

ENABLEMODULES: 'text2vec-openai,generative-openai'

OPENAIAPIKEY: 'sk-your-openai-api-key'

CLUSTERHOSTNAME: 'node1'

volumes:

  • weaviatedata:/var/lib/weaviate

volumes:

weaviatedata:

Start Weaviate:

docker-compose up -d

Using Weaviate Cloud (WCD)

For production deployments, you can use Weaviate Cloud:

  • Create an account at console.weaviate.cloud
  • Create a new cluster
  • Copy the cluster URL and API key
  • Python Client Installation

    pip install weaviate-client
    

    Connecting to Weaviate

    import weaviate
    

    from weaviate.classes.init import Auth

    Connect to local instance (Docker)

    client = weaviate.connecttolocal()

    Connect to Weaviate Cloud

    client = weaviate.connecttoweaviatecloud(

    clusterurl="https://your-cluster.weaviate.network",

    authcredentials=Auth.apikey("your-wcd-api-key"),

    headers={

    "X-OpenAI-Api-Key": "sk-your-openai-api-key"

    }

    )

    Verify connection

    print(client.isready()) # True if successful

    Schema Definition

    Schema in Weaviate defines the data structure, including properties and vectorizer configuration.

    Creating a Collection (Class)

    import weaviate
    

    import weaviate.classes.config as wc

    client = weaviate.connecttolocal()

    Create a simple collection

    client.collections.create(

    name="Article",

    description="Blog article collection",

    vectorizerconfig=wc.Configure.Vectorizer.text2vecopenai(

    model="text-embedding-3-small",

    ),

    generativeconfig=wc.Configure.Generative.openai(

    model="gpt-4",

    ),

    properties=[

    wc.Property(

    name="title",

    datatype=wc.DataType.TEXT,

    description="Article title",

    ),

    wc.Property(

    name="content",

    datatype=wc.DataType.TEXT,

    description="Article content",

    ),

    wc.Property(

    name="author",

    datatype=wc.DataType.TEXT,

    description="Author name",

    skipvectorization=True, # Not vectorized

    ),

    Related Articles

    Complete Qdrant Tutorial: Vector Database for AI Applications

    Tutorial Lengkap Qdrant: Vector Database untuk Aplikasi AI Qdrant adalah vector database performa tinggi yang dirancang ...

    Sentence Transformers Tutorial: Embeddings, Similarity, and Rerankers

    Sentence Transformers: Embedding, Kemiripan Semantik, dan Reranker Sentence Transformers (sering disebut SBERT) adalah p...

    LanceDB: Serverless Vector Database for Multimodal AI Applications

    LanceDB: Database Vektor Serverless untuk Aplikasi AI Multimodal Database vektor telah menjadi komponen fundamental dala...

    Milvus Tutorial: Distributed Vector Database for AI

    Tutorial 10: Milvus - Database Vektor Terdistribusi untuk AI Daftar Isi Pendahuluan Prasyarat Arsitektur Milvus [Instala...