Weaviate: Vector Database with Integrated AI Modules
Weaviate is an open-source vector database designed to store data objects along with their vector embeddings. What makes Weaviate unique is its ability to integrate AI modules directly into the database, enabling auto-vectorization, semantic search, and even generative AI without additional infrastructure.
In this tutorial, we will learn how to use Weaviate from installation, schema definition, vector and hybrid search, to building a semantic product search engine with auto-vectorization and generative answers.
Why Weaviate?
Weaviate offers several advantages over other vector databases:
- Integrated AI Modules: Vectorizer and generative modules built directly into the database
- Auto-Vectorization: Data is automatically vectorized on insertion without manual preprocessing
- Hybrid Search: Combines vector (semantic) and keyword (BM25) search
- GraphQL API: Flexible and powerful query interface
- Multi-Tenancy: Data isolation for multi-tenant applications
- Scalability: Supports horizontal scaling for large datasets
- Integration Ecosystem: Compatible with LangChain, LlamaIndex, and other AI frameworks
Installation
Using Docker (Recommended for Development)
Create a docker-compose.yml file:
version: '3.4'
services:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:1.25.0
restart: on-failure:0
ports:
- "8080:8080"
- "50051:50051"
environment:
QUERYDEFAULTSLIMIT: 25
AUTHENTICATIONANONYMOUSACCESSENABLED: 'true'
PERSISTENCEDATAPATH: '/var/lib/weaviate'
DEFAULTVECTORIZERMODULE: 'text2vec-openai'
ENABLEMODULES: 'text2vec-openai,generative-openai'
OPENAIAPIKEY: 'sk-your-openai-api-key'
CLUSTERHOSTNAME: 'node1'
volumes:
- weaviatedata:/var/lib/weaviate
volumes:
weaviatedata:
Start Weaviate:
docker-compose up -d
Using Weaviate Cloud (WCD)
For production deployments, you can use Weaviate Cloud:
Python Client Installation
pip install weaviate-client
Connecting to Weaviate
import weaviate
from weaviate.classes.init import Auth
Connect to local instance (Docker)
client = weaviate.connecttolocal()
Connect to Weaviate Cloud
client = weaviate.connecttoweaviatecloud(
clusterurl="https://your-cluster.weaviate.network",
authcredentials=Auth.apikey("your-wcd-api-key"),
headers={
"X-OpenAI-Api-Key": "sk-your-openai-api-key"
}
)
Verify connection
print(client.isready()) # True if successful
Schema Definition
Schema in Weaviate defines the data structure, including properties and vectorizer configuration.
Creating a Collection (Class)
import weaviate
import weaviate.classes.config as wc
client = weaviate.connecttolocal()
Create a simple collection
client.collections.create(
name="Article",
description="Blog article collection",
vectorizerconfig=wc.Configure.Vectorizer.text2vecopenai(
model="text-embedding-3-small",
),
generativeconfig=wc.Configure.Generative.openai(
model="gpt-4",
),
properties=[
wc.Property(
name="title",
datatype=wc.DataType.TEXT,
description="Article title",
),
wc.Property(
name="content",
datatype=wc.DataType.TEXT,
description="Article content",
),
wc.Property(
name="author",
datatype=wc.DataType.TEXT,
description="Author name",
skipvectorization=True, # Not vectorized
),