Complete pgvector Tutorial: Vector Database in PostgreSQL

# Tutorial Lengkap pgvector: Vector Database di PostgreSQL pgvector adalah extension PostgreSQL yang memungkinkan Anda menyimpan dan melakukan similarity search pada vector embeddings. Ini sangat ber...

By Ruby Abdullah · · tutorial
PostgreSQLpgvectorVector DatabaseAIRAGEmbeddings

Complete pgvector Tutorial: Vector Database in PostgreSQL

pgvector is a PostgreSQL extension that allows you to store and perform similarity search on vector embeddings. This is extremely useful for AI applications such as semantic search, recommendation systems, and RAG (Retrieval-Augmented Generation).

What is pgvector?

pgvector adds the vector data type to PostgreSQL and provides:

  • Vector storage with dimensions up to 16,000
  • Similarity search with various distance metrics
  • Indexing for fast search (IVFFlat, HNSW)
  • Seamless integration with SQL queries

Use Cases:
  • Semantic search
  • Similarity matching (images, documents, products)
  • Recommendation systems
  • RAG for LLM applications
  • Clustering and classification

Installing pgvector

1. Install on Ubuntu

# Install dependencies

sudo apt update

sudo apt install -y postgresql postgresql-contrib

Install pgvector from source

sudo apt install -y postgresql-server-dev-all git build-essential

Clone and build pgvector

cd /tmp

git clone --branch v0.7.0 https://github.com/pgvector/pgvector.git

cd pgvector

make

sudo make install

2. Install via Docker

# Pull image with pgvector

docker pull pgvector/pgvector:pg16

Run container

docker run -d \

--name pgvector-db \

-e POSTGRESPASSWORD=mysecretpassword \

-e POSTGRESDB=vectordb \

-p 5432:5432 \

pgvector/pgvector:pg16

3. Enable Extension

-- Connect to database

psql -U postgres -d vectordb

-- Create extension

CREATE EXTENSION IF NOT EXISTS vector;

-- Verify installation

SELECT FROM pgextension WHERE extname = 'vector';

Basic Concepts

1. Vector Data Type

-- Create table with vector column

CREATE TABLE items (

id SERIAL PRIMARY KEY,

name TEXT,

embedding VECTOR(3) -- Vector with 3 dimensions

);

-- Insert vector

INSERT INTO items (name, embedding) VALUES

('item1', '[1, 2, 3]'),

('item2', '[4, 5, 6]'),

('item3', '[1, 2, 4]');

-- Query vector

SELECT FROM items;

2. Distance Metrics

pgvector supports several distance metrics:

| Operator | Distance | Use Case |

|----------|----------|----------|

| <-> | L2 (Euclidean) | Default, general purpose |

| <#> | Inner Product (Negative) | Dot product similarity |

| <=> | Cosine Distance | Normalized vectors |

| <+> | L1 (Manhattan) | Sparse vectors |

-- L2 Distance (Euclidean)

SELECT name, embedding <-> '[1, 2, 3]' AS distance

FROM items

ORDER BY distance

LIMIT 5;

-- Cosine Distance

SELECT name, embedding <=> '[1, 2, 3]' AS distance

FROM items

ORDER BY distance

LIMIT 5;

-- Inner Product

SELECT name, embedding <#> '[1, 2, 3]' AS distance

FROM items

ORDER BY distance

LIMIT 5;

3. Vector Dimensions

-- Check vector dimensions

SELECT vectordims(embedding) FROM items LIMIT 1;

-- Normalize vector

SELECT l2normalize(embedding) FROM items;

-- Vector arithmetic

SELECT embedding + '[1, 1, 1]' FROM items WHERE id = 1;

SELECT embedding 2 FROM items WHERE id = 1;

Indexing for Performance

1. IVFFlat Index

IVFFlat (Inverted File with Flat compression) is suitable for large datasets with accuracy trade-offs.

-- Create IVFFlat index

CREATE INDEX ON items USING ivfflat (embedding vectorl2ops)

WITH (lists = 100);

-- For cosine distance

CREATE INDEX ON items USING ivfflat (embedding vectorcosineops)

WITH (lists = 100);

-- For inner product

CREATE INDEX ON items USING ivfflat (embedding vectoripops)

WITH (lists = 100);

lists Parameter:
  • Number of clusters for the index
  • Rule of thumb: sqrt(numrows) for < 1M rows
  • For > 1M rows: sqrt(numrows) up to numrows / 1000

Related Articles

Complete ChromaDB Tutorial: Simple Vector Database for AI

Tutorial Lengkap ChromaDB: Vector Database Sederhana untuk AI ChromaDB adalah open-source vector database yang dirancang...

Complete Qdrant Tutorial: Vector Database for AI Applications

Tutorial Lengkap Qdrant: Vector Database untuk Aplikasi AI Qdrant adalah vector database performa tinggi yang dirancang ...

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

Tutorial Lengkap LlamaIndex: Membangun Aplikasi RAG dengan LLM LlamaIndex adalah framework data yang powerful untuk memb...

Milvus Tutorial: Distributed Vector Database for AI

Tutorial 10: Milvus - Database Vektor Terdistribusi untuk AI Daftar Isi Pendahuluan Prasyarat Arsitektur Milvus [Instala...