Complete ChromaDB Tutorial: Simple Vector Database for AI

ChromaDB is an open-source vector database designed to store and query embeddings easily. With its intuitive API, ChromaDB is perfect for building RAG applications, semantic search, and recommendation systems.

Why ChromaDB?

ChromaDB Advantages:

Simple API: Easy to learn and use
Embedded mode: Can run in-memory or persistent
Multi-modal: Support text, images, embeddings
Integrations: LangChain, LlamaIndex, OpenAI
No infrastructure: No server setup required

Installation

pip install chromadb pip install chromadb-client # For client mode With sentence-transformers pip install sentence-transformers

Quick Start

1. Basic Usage

import chromadb

Create client (in-memory)
client = chromadb.Client()

Or persistent storage
client = chromadb.PersistentClient(path="./chromadb")


Create collection
collection = client.createcollection(name="mycollection")


Add documents
collection.add(
    documents=["Python is a programming language", "Machine learning is AI"],
    metadatas=[{"source": "doc1"}, {"source": "doc2"}],
    ids=["id1", "id2"]
)

Query
results = collection.query(
    querytexts=["What is Python?"],
    nresults=2

)
print(results)

2. With Embeddings

import chromadb
from chromadb.utils import embeddingfunctions

Setup embedding function
sentencetransformeref = embeddingfunctions.SentenceTransformerEmbeddingFunction(

    modelname="all-MiniLM-L6-v2"
)

Create collection with embedding function
collection = client.createcollection(

    name="docs",
    embeddingfunction=sentencetransformeref
)

Add documents (embeddings auto-generated)
collection.add(
    documents=["Doc 1 content", "Doc 2 content"],
    ids=["1", "2"]
)

Query
results = collection.query(
    querytexts=["search query"],

    nresults=5
)

Collections

1. Collection Operations

# Create
collection = client.createcollection("mycollection")

Get existing
collection = client.getcollection("mycollection")

Get or create
collection = client.getorcreatecollection("mycollection")

Delete
client.deletecollection("mycollection")

List all
collections = client.listcollections()


Count items
count = collection.count()

2. Collection with Custom Embedding

from chromadb.utils import embeddingfunctions

OpenAI embeddings
openaief = embeddingfunctions.OpenAIEmbeddingFunction(
    apikey="your-api-key",

    modelname="text-embedding-3-small"
)

Sentence Transformers
stef = embeddingfunctions.SentenceTransformerEmbeddingFunction(
    modelname="all-mpnet-base-v2"

)

Hugging Face
hfef = embeddingfunctions.HuggingFaceEmbeddingFunction(

    apikey="your-hf-token",
    modelname="sentence-transformers/all-MiniLM-L6-v2"

)

collection = client.createcollection(
    name="mydocs",

    embeddingfunction=openaief,

    metadata={"hnsw:space": "cosine"}  # Distance metric
)

CRUD Operations

1. Add Documents

# Add with documents (auto-embed) collection.add( documents=["text 1", "text 2", "text 3"], metadatas=[{"source": "a"}, {"source": "b"}, {"source": "c"}], ids=["1", "2", "3"] ) Add with pre-computed embeddings collection.add( embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]], metadatas=[{"key": "value"}], ids=["1", "2"] ) Add with documents and embeddings collection.add( documents=["text"], embeddings=[[0.1, 0.2, ...]], metadatas=[{"key": "value"}], ids=["1"] )

2. Query

# Basic query results = collection.query(

Complete ChromaDB Tutorial: Simple Vector Database for AI

Complete ChromaDB Tutorial: Simple Vector Database for AI

Why ChromaDB?

Installation

With sentence-transformers

Quick Start

1. Basic Usage

Create client (in-memory)

Or persistent storage

Create collection

Add documents

Query

2. With Embeddings

Setup embedding function

Create collection with embedding function

Add documents (embeddings auto-generated)

Query

Collections

1. Collection Operations

Get existing

Get or create

Delete

List all

Count items

2. Collection with Custom Embedding

OpenAI embeddings

Sentence Transformers

Hugging Face

CRUD Operations

1. Add Documents

Add with pre-computed embeddings

Add with documents and embeddings

2. Query

Related Articles

Complete Qdrant Tutorial: Vector Database for AI Applications

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

Complete pgvector Tutorial: Vector Database in PostgreSQL

Milvus Tutorial: Distributed Vector Database for AI

Related Articles

Complete Qdrant Tutorial: Vector Database for AI Applications

Tutorial Lengkap Qdrant: Vector Database untuk Aplikasi AI Qdrant adalah vector database performa tinggi yang dirancang ...

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

Tutorial Lengkap LlamaIndex: Membangun Aplikasi RAG dengan LLM LlamaIndex adalah framework data yang powerful untuk memb...

Complete pgvector Tutorial: Vector Database in PostgreSQL

Tutorial Lengkap pgvector: Vector Database di PostgreSQL pgvector adalah extension PostgreSQL yang memungkinkan Anda men...

Milvus Tutorial: Distributed Vector Database for AI

Tutorial 10: Milvus - Database Vektor Terdistribusi untuk AI Daftar Isi Pendahuluan Prasyarat Arsitektur Milvus [Instala...