Complete pgvector Tutorial: Vector Database in PostgreSQL

pgvector is a PostgreSQL extension that allows you to store and perform similarity search on vector embeddings. This is extremely useful for AI applications such as semantic search, recommendation systems, and RAG (Retrieval-Augmented Generation).

What is pgvector?

pgvector adds the vector data type to PostgreSQL and provides:

Vector storage with dimensions up to 16,000
Similarity search with various distance metrics
Indexing for fast search (IVFFlat, HNSW)
Seamless integration with SQL queries

Use Cases:

Semantic search
Similarity matching (images, documents, products)
Recommendation systems
RAG for LLM applications
Clustering and classification

Installing pgvector

1. Install on Ubuntu

# Install dependencies sudo apt update sudo apt install -y postgresql postgresql-contrib Install pgvector from source sudo apt install -y postgresql-server-dev-all git build-essential Clone and build pgvector cd /tmp git clone --branch v0.7.0 https://github.com/pgvector/pgvector.git cd pgvector make sudo make install

2. Install via Docker

# Pull image with pgvector docker pull pgvector/pgvector:pg16 Run container docker run -d \ --name pgvector-db \ -e POSTGRESPASSWORD=mysecretpassword \ -e POSTGRESDB=vectordb \ -p 5432:5432 \ pgvector/pgvector:pg16

3. Enable Extension

-- Connect to database
psql -U postgres -d vectordb

-- Create extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify installation
SELECT  FROM pgextension WHERE extname = 'vector';

Basic Concepts

1. Vector Data Type

-- Create table with vector column CREATE TABLE items ( id SERIAL PRIMARY KEY, name TEXT, embedding VECTOR(3) -- Vector with 3 dimensions ); -- Insert vector INSERT INTO items (name, embedding) VALUES ('item1', '[1, 2, 3]'), ('item2', '[4, 5, 6]'), ('item3', '[1, 2, 4]'); -- Query vectorSELECT FROM items;

2. Distance Metrics

pgvector supports several distance metrics:

| Operator | Distance | Use Case |

|----------|----------|----------|

| <-> | L2 (Euclidean) | Default, general purpose |

| <#> | Inner Product (Negative) | Dot product similarity |

| <=> | Cosine Distance | Normalized vectors |

| <+> | L1 (Manhattan) | Sparse vectors |

-- L2 Distance (Euclidean) SELECT name, embedding <-> '[1, 2, 3]' AS distance FROM items ORDER BY distance LIMIT 5; -- Cosine Distance SELECT name, embedding <=> '[1, 2, 3]' AS distance FROM items ORDER BY distance LIMIT 5; -- Inner Product SELECT name, embedding <#> '[1, 2, 3]' AS distance FROM items ORDER BY distance LIMIT 5;

3. Vector Dimensions

-- Check vector dimensions
SELECT vectordims(embedding) FROM items LIMIT 1;

-- Normalize vector
SELECT l2normalize(embedding) FROM items;


-- Vector arithmetic
SELECT embedding + '[1, 1, 1]' FROM items WHERE id = 1;
SELECT embedding  2 FROM items WHERE id = 1;

Indexing for Performance

1. IVFFlat Index

IVFFlat (Inverted File with Flat compression) is suitable for large datasets with accuracy trade-offs.

-- Create IVFFlat indexCREATE INDEX ON items USING ivfflat (embedding vectorl2ops) WITH (lists = 100); -- For cosine distance CREATE INDEX ON items USING ivfflat (embedding vectorcosineops) WITH (lists = 100); -- For inner product CREATE INDEX ON items USING ivfflat (embedding vectoripops) WITH (lists = 100);
lists Parameter:
Number of clusters for the index

Rule of thumb: sqrt(numrows) for < 1M rows

For > 1M rows: sqrt(numrows) up to numrows / 1000

Complete pgvector Tutorial: Vector Database in PostgreSQL

Complete pgvector Tutorial: Vector Database in PostgreSQL

What is pgvector?

Installing pgvector

1. Install on Ubuntu

Install pgvector from source

Clone and build pgvector

2. Install via Docker

Run container

3. Enable Extension

Basic Concepts

1. Vector Data Type

2. Distance Metrics

3. Vector Dimensions

Indexing for Performance

1. IVFFlat Index

Related Articles

Complete ChromaDB Tutorial: Simple Vector Database for AI

Complete Qdrant Tutorial: Vector Database for AI Applications

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

Milvus Tutorial: Distributed Vector Database for AI

Related Articles

Complete ChromaDB Tutorial: Simple Vector Database for AI

Tutorial Lengkap ChromaDB: Vector Database Sederhana untuk AI ChromaDB adalah open-source vector database yang dirancang...

Complete Qdrant Tutorial: Vector Database for AI Applications

Tutorial Lengkap Qdrant: Vector Database untuk Aplikasi AI Qdrant adalah vector database performa tinggi yang dirancang ...

Complete LlamaIndex Tutorial: Building RAG Applications with LLMs

Tutorial Lengkap LlamaIndex: Membangun Aplikasi RAG dengan LLM LlamaIndex adalah framework data yang powerful untuk memb...

Milvus Tutorial: Distributed Vector Database for AI

Tutorial 10: Milvus - Database Vektor Terdistribusi untuk AI Daftar Isi Pendahuluan Prasyarat Arsitektur Milvus [Instala...