Tutorial Lengkap ChromaDB: Vector Database Sederhana untuk AI
ChromaDB adalah open-source vector database yang dirancang untuk menyimpan dan query embeddings dengan mudah. Dengan API yang intuitif, ChromaDB cocok untuk membangun aplikasi RAG, semantic search, dan recommendation systems.
Mengapa ChromaDB?
Keunggulan ChromaDB:- Simple API: Mudah dipelajari dan digunakan
- Embedded mode: Bisa berjalan in-memory atau persistent
- Multi-modal: Support text, images, embeddings
- Integrations: LangChain, LlamaIndex, OpenAI
- No infrastructure: Tidak perlu setup server
Instalasi
pip install chromadb
pip install chromadb-client # Untuk client mode
Dengan sentence-transformers
pip install sentence-transformers
Quick Start
1. Basic Usage
import chromadb
Create client (in-memory)
client = chromadb.Client()
Atau persistent storage
client = chromadb.PersistentClient(path="./chromadb")
Create collection
collection = client.createcollection(name="mycollection")
Add documents
collection.add(
documents=["Python is a programming language", "Machine learning is AI"],
metadatas=[{"source": "doc1"}, {"source": "doc2"}],
ids=["id1", "id2"]
)
Query
results = collection.query(
querytexts=["What is Python?"],
nresults=2
)
print(results)
2. Dengan Embeddings
import chromadb
from chromadb.utils import embeddingfunctions
Setup embedding function
sentencetransformeref = embeddingfunctions.SentenceTransformerEmbeddingFunction(
modelname="all-MiniLM-L6-v2"
)
Create collection dengan embedding function
collection = client.createcollection(
name="docs",
embeddingfunction=sentencetransformeref
)
Add documents (embeddings auto-generated)
collection.add(
documents=["Doc 1 content", "Doc 2 content"],
ids=["1", "2"]
)
Query
results = collection.query(
querytexts=["search query"],
nresults=5
)
Collections
1. Collection Operations
# Create
collection = client.createcollection("mycollection")
Get existing
collection = client.getcollection("mycollection")
Get or create
collection = client.getorcreatecollection("mycollection")
Delete
client.deletecollection("mycollection")
List all
collections = client.listcollections()
Count items
count = collection.count()
2. Collection dengan Custom Embedding
from chromadb.utils import embeddingfunctions
OpenAI embeddings
openaief = embeddingfunctions.OpenAIEmbeddingFunction(
apikey="your-api-key",
modelname="text-embedding-3-small"
)
Sentence Transformers
stef = embeddingfunctions.SentenceTransformerEmbeddingFunction(
modelname="all-mpnet-base-v2"
)
Hugging Face
hfef = embeddingfunctions.HuggingFaceEmbeddingFunction(
apikey="your-hf-token",
modelname="sentence-transformers/all-MiniLM-L6-v2"
)
collection = client.createcollection(
name="mydocs",
embeddingfunction=openaief,
metadata={"hnsw:space": "cosine"} # Distance metric
)
CRUD Operations
1. Add Documents
# Add dengan documents (auto-embed)
collection.add(
documents=["text 1", "text 2", "text 3"],
metadatas=[{"source": "a"}, {"source": "b"}, {"source": "c"}],
ids=["1", "2", "3"]
)
Add dengan pre-computed embeddings
collection.add(
embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]],
metadatas=[{"key": "value"}],
ids=["1", "2"]
)
Add dengan documents dan embeddings
collection.add(
documents=["text"],
embeddings=[[0.1, 0.2, ...]],
metadatas=[{"key": "value"}],
ids=["1"]
)