Complete ChromaDB Tutorial: Simple Vector Database for AI
ChromaDB is an open-source vector database designed to store and query embeddings easily. With its intuitive API, ChromaDB is perfect for building RAG applications, semantic search, and recommendation systems.
Why ChromaDB?
ChromaDB Advantages:- Simple API: Easy to learn and use
- Embedded mode: Can run in-memory or persistent
- Multi-modal: Support text, images, embeddings
- Integrations: LangChain, LlamaIndex, OpenAI
- No infrastructure: No server setup required
Installation
pip install chromadb
pip install chromadb-client # For client mode
With sentence-transformers
pip install sentence-transformers
Quick Start
1. Basic Usage
import chromadb
Create client (in-memory)
client = chromadb.Client()
Or persistent storage
client = chromadb.PersistentClient(path="./chromadb")
Create collection
collection = client.createcollection(name="mycollection")
Add documents
collection.add(
documents=["Python is a programming language", "Machine learning is AI"],
metadatas=[{"source": "doc1"}, {"source": "doc2"}],
ids=["id1", "id2"]
)
Query
results = collection.query(
querytexts=["What is Python?"],
nresults=2
)
print(results)
2. With Embeddings
import chromadb
from chromadb.utils import embeddingfunctions
Setup embedding function
sentencetransformeref = embeddingfunctions.SentenceTransformerEmbeddingFunction(
modelname="all-MiniLM-L6-v2"
)
Create collection with embedding function
collection = client.createcollection(
name="docs",
embeddingfunction=sentencetransformeref
)
Add documents (embeddings auto-generated)
collection.add(
documents=["Doc 1 content", "Doc 2 content"],
ids=["1", "2"]
)
Query
results = collection.query(
querytexts=["search query"],
nresults=5
)
Collections
1. Collection Operations
# Create
collection = client.createcollection("mycollection")
Get existing
collection = client.getcollection("mycollection")
Get or create
collection = client.getorcreatecollection("mycollection")
Delete
client.deletecollection("mycollection")
List all
collections = client.listcollections()
Count items
count = collection.count()
2. Collection with Custom Embedding
from chromadb.utils import embeddingfunctions
OpenAI embeddings
openaief = embeddingfunctions.OpenAIEmbeddingFunction(
apikey="your-api-key",
modelname="text-embedding-3-small"
)
Sentence Transformers
stef = embeddingfunctions.SentenceTransformerEmbeddingFunction(
modelname="all-mpnet-base-v2"
)
Hugging Face
hfef = embeddingfunctions.HuggingFaceEmbeddingFunction(
apikey="your-hf-token",
modelname="sentence-transformers/all-MiniLM-L6-v2"
)
collection = client.createcollection(
name="mydocs",
embeddingfunction=openaief,
metadata={"hnsw:space": "cosine"} # Distance metric
)
CRUD Operations
1. Add Documents
# Add with documents (auto-embed)
collection.add(
documents=["text 1", "text 2", "text 3"],
metadatas=[{"source": "a"}, {"source": "b"}, {"source": "c"}],
ids=["1", "2", "3"]
)
Add with pre-computed embeddings
collection.add(
embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]],
metadatas=[{"key": "value"}],
ids=["1", "2"]
)
Add with documents and embeddings
collection.add(
documents=["text"],
embeddings=[[0.1, 0.2, ...]],
metadatas=[{"key": "value"}],
ids=["1"]
)
2. Query
# Basic query
results = collection.query(