Tutorial Lengkap pgvector: Vector Database di PostgreSQL
pgvector adalah extension PostgreSQL yang memungkinkan Anda menyimpan dan melakukan similarity search pada vector embeddings. Ini sangat berguna untuk aplikasi AI seperti semantic search, recommendation systems, dan RAG (Retrieval-Augmented Generation).
Apa itu pgvector?
pgvector menambahkan tipe data vector ke PostgreSQL dan menyediakan:
- Penyimpanan vector dengan dimensi hingga 16,000
- Similarity search dengan berbagai distance metrics
- Indexing untuk pencarian cepat (IVFFlat, HNSW)
- Integrasi seamless dengan SQL queries
- Semantic search
- Similarity matching (gambar, dokumen, produk)
- Recommendation systems
- RAG untuk LLM applications
- Clustering dan classification
Instalasi pgvector
1. Install di Ubuntu
# Install dependencies
sudo apt update
sudo apt install -y postgresql postgresql-contrib
Install pgvector dari source
sudo apt install -y postgresql-server-dev-all git build-essential
Clone dan build pgvector
cd /tmp
git clone --branch v0.7.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
2. Install via Docker
# Pull image dengan pgvector
docker pull pgvector/pgvector:pg16
Run container
docker run -d \
--name pgvector-db \
-e POSTGRESPASSWORD=mysecretpassword \
-e POSTGRESDB=vectordb \
-p 5432:5432 \
pgvector/pgvector:pg16
3. Enable Extension
-- Connect ke database
psql -U postgres -d vectordb
-- Create extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify installation
SELECT FROM pgextension WHERE extname = 'vector';
Konsep Dasar
1. Tipe Data Vector
-- Buat tabel dengan kolom vector
CREATE TABLE items (
id SERIAL PRIMARY KEY,
name TEXT,
embedding VECTOR(3) -- Vector dengan 3 dimensi
);
-- Insert vector
INSERT INTO items (name, embedding) VALUES
('item1', '[1, 2, 3]'),
('item2', '[4, 5, 6]'),
('item3', '[1, 2, 4]');
-- Query vector
SELECT FROM items;
2. Distance Metrics
pgvector mendukung beberapa distance metrics:
| Operator | Distance | Use Case |
|----------|----------|----------|
| <-> | L2 (Euclidean) | Default, general purpose |
| <#> | Inner Product (Negative) | Dot product similarity |
| <=> | Cosine Distance | Normalized vectors |
| <+> | L1 (Manhattan) | Sparse vectors |
-- L2 Distance (Euclidean)
SELECT name, embedding <-> '[1, 2, 3]' AS distance
FROM items
ORDER BY distance
LIMIT 5;
-- Cosine Distance
SELECT name, embedding <=> '[1, 2, 3]' AS distance
FROM items
ORDER BY distance
LIMIT 5;
-- Inner Product
SELECT name, embedding <#> '[1, 2, 3]' AS distance
FROM items
ORDER BY distance
LIMIT 5;
3. Dimensi Vector
-- Check dimensi vector
SELECT vectordims(embedding) FROM items LIMIT 1;
-- Normalize vector
SELECT l2normalize(embedding) FROM items;
-- Vector arithmetic
SELECT embedding + '[1, 1, 1]' FROM items WHERE id = 1;
SELECT embedding 2 FROM items WHERE id = 1;
Indexing untuk Performance
1. IVFFlat Index
IVFFlat (Inverted File with Flat compression) cocok untuk dataset besar dengan trade-off akurasi.
-- Buat index IVFFlat
CREATE INDEX ON items USING ivfflat (embedding vectorl2ops)
WITH (lists = 100);
-- Untuk cosine distance
CREATE INDEX ON items USING ivfflat (embedding vectorcosineops)
WITH (lists = 100);
-- Untuk inner product
CREATE INDEX ON items USING ivfflat (embedding vectoripops)
WITH (lists = 100);
Parameter lists:
- Jumlah clusters untuk index
- Rule of thumb:
sqrt(numrows)untuk < 1M rows - Untuk > 1M rows:
sqrt(numrows)hingganumrows / 1000