Complete ChromaDB Tutorial: Simple Vector Database for AI

# Tutorial Lengkap ChromaDB: Vector Database Sederhana untuk AI ChromaDB adalah open-source vector database yang dirancang untuk menyimpan dan query embeddings dengan mudah. Dengan API yang intuitif,...

By Ruby Abdullah · · tutorial
ChromaDBVector DatabaseRAGEmbeddingsPythonAI

Complete ChromaDB Tutorial: Simple Vector Database for AI

ChromaDB is an open-source vector database designed to store and query embeddings easily. With its intuitive API, ChromaDB is perfect for building RAG applications, semantic search, and recommendation systems.

Why ChromaDB?

ChromaDB Advantages:
  • Simple API: Easy to learn and use
  • Embedded mode: Can run in-memory or persistent
  • Multi-modal: Support text, images, embeddings
  • Integrations: LangChain, LlamaIndex, OpenAI
  • No infrastructure: No server setup required

Installation

pip install chromadb

pip install chromadb-client # For client mode

With sentence-transformers

pip install sentence-transformers

Quick Start

1. Basic Usage

import chromadb

Create client (in-memory)

client = chromadb.Client()

Or persistent storage

client = chromadb.PersistentClient(path="./chromadb")

Create collection

collection = client.createcollection(name="mycollection")

Add documents

collection.add(

documents=["Python is a programming language", "Machine learning is AI"],

metadatas=[{"source": "doc1"}, {"source": "doc2"}],

ids=["id1", "id2"]

)

Query

results = collection.query(

querytexts=["What is Python?"],

nresults=2

)

print(results)

2. With Embeddings

import chromadb

from chromadb.utils import embeddingfunctions

Setup embedding function

sentencetransformeref = embeddingfunctions.SentenceTransformerEmbeddingFunction(

modelname="all-MiniLM-L6-v2"

)

Create collection with embedding function

collection = client.createcollection(

name="docs",

embeddingfunction=sentencetransformeref

)

Add documents (embeddings auto-generated)

collection.add(

documents=["Doc 1 content", "Doc 2 content"],

ids=["1", "2"]

)

Query

results = collection.query(

querytexts=["search query"],

nresults=5

)

Collections

1. Collection Operations

# Create

collection = client.createcollection("mycollection")

Get existing

collection = client.getcollection("mycollection")

Get or create

collection = client.getorcreatecollection("mycollection")

Delete

client.deletecollection("mycollection")

List all

collections = client.listcollections()

Count items

count = collection.count()

2. Collection with Custom Embedding

from chromadb.utils import embeddingfunctions

OpenAI embeddings

openaief = embeddingfunctions.OpenAIEmbeddingFunction(

apikey="your-api-key",

modelname="text-embedding-3-small"

)

Sentence Transformers

stef = embeddingfunctions.SentenceTransformerEmbeddingFunction(

modelname="all-mpnet-base-v2"

)

Hugging Face

hfef = embeddingfunctions.HuggingFaceEmbeddingFunction(

apikey="your-hf-token",

modelname="sentence-transformers/all-MiniLM-L6-v2"

)

collection = client.createcollection(

name="mydocs",

embeddingfunction=openaief,

metadata={"hnsw:space": "cosine"} # Distance metric

)

CRUD Operations

1. Add Documents

# Add with documents (auto-embed)

collection.add(

documents=["text 1", "text 2", "text 3"],

metadatas=[{"source": "a"}, {"source": "b"}, {"source": "c"}],

ids=["1", "2", "3"]

)

Add with pre-computed embeddings

collection.add(

embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]],

metadatas=[{"key": "value"}],

ids=["1", "2"]

)

Add with documents and embeddings

collection.add(

documents=["text"],

embeddings=[[0.1, 0.2, ...]],

metadatas=[{"key": "value"}],

ids=["1"]

)

2. Query

# Basic query

results = collection.query(

Related Articles

Complete Qdrant Tutorial: Vector Database for AI Applications

Tutorial Lengkap Qdrant: Vector Database untuk Aplikasi AI Qdrant adalah vector database performa tinggi yang dirancang ...

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

Tutorial Lengkap LlamaIndex: Membangun Aplikasi RAG dengan LLM LlamaIndex adalah framework data yang powerful untuk memb...

Complete pgvector Tutorial: Vector Database in PostgreSQL

Tutorial Lengkap pgvector: Vector Database di PostgreSQL pgvector adalah extension PostgreSQL yang memungkinkan Anda men...

Milvus Tutorial: Distributed Vector Database for AI

Tutorial 10: Milvus - Database Vektor Terdistribusi untuk AI Daftar Isi Pendahuluan Prasyarat Arsitektur Milvus [Instala...