FAISS: Efficient Vector Similarity Search at Scale

FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings for indexing and searching dense vectors. When your application needs to find the nearest neighbours of an embedding among millions or billions of vectors, FAISS gives you fine-grained control over the speed, memory, and recall trade-offs that a managed vector database hides behind an API. This tutorial focuses on the library itself: its index types, how each one works internally, and how to use them in production.

Who this tutorial is for

This is not another generic "build a semantic search engine" walkthrough. We assume you already understand embeddings and cosine similarity at a high level. Instead we go deep on the FAISS index zoo: exact flat indexes, inverted-file (IVF) indexes, product quantization (PQ), HNSW graphs, ID mapping, persistence, and GPU offload. By the end you will be able to choose the right index for your dataset size and latency budget, and tune it deliberately rather than by guesswork.

What FAISS is and what it is not

FAISS solves one problem extremely well: given a query vector, return the k most similar vectors from a collection, using either L2 (Euclidean) distance or inner product. It is a library, not a service. There is no network layer, no authentication, no metadata filtering engine, and no built-in persistence beyond reading and writing a single index file.

That minimalism is the point. FAISS lets you:

Decide exactly how vectors are stored (full precision, quantized, on disk).
Trade recall for speed by changing a single parameter at query time.
Run the same index on CPU or GPU with a one-line move.
Embed the index directly inside your own process, avoiding network round-trips.

When to use FAISS vs a managed vector database

Reach for FAISS when:

You want a single library embedded in your service with no extra infrastructure.
You need precise control over the index structure and memory footprint.
Your vectors are mostly static, or you rebuild the index on a schedule.
You are doing research or batch similarity computation.

Reach for a managed vector database (Qdrant, Milvus, Weaviate, pgvector, Pinecone) when:

You need rich metadata filtering combined with vector search.
You require frequent inserts, updates, and deletes with durability guarantees.
You want horizontal scaling, replication, and an HTTP/gRPC API out of the box.
Operating a stateful service is acceptable and you would rather not build it yourself.

Many of those databases actually use FAISS-like algorithms (IVF, HNSW, PQ) under the hood, so understanding FAISS makes you better at tuning them too.

Installation

FAISS ships as two mutually exclusive packages. Install exactly one.

# CPU-only build (works everywhere, good default) pip install faiss-cpu GPU build (requires a CUDA-capable GPU and matching CUDA runtime) pip install faiss-gpu

For the examples we also use sentence-transformers to produce embeddings and numpy for array handling.

pip install sentence-transformers numpy

A quick sanity check:

import faiss
import numpy as np

print("FAISS version:", faiss.version)
print("Number of GPUs visible to FAISS:", faiss.getnumgpus())

If getnumgpus() returns 0 you are on the CPU build, which is fine for everything except the GPU section near the end.

Creating embeddings

FAISS works with float32 NumPy arrays of shape (nvectors, dimension). It does not generate embeddings itself; you bring your own. Here we use a small sentence-transformer model.

from sentencetransformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")  # 384-dimensional output

documents = [
    "FAISS performs nearest-neighbour search over dense vectors.",
    "Product quantization compresses vectors to save memory.",
    "An inverted file index partitions the vector space into cells.",
    "HNSW builds a navigable small-world graph for fast search.",
    "Cosine similarity is inner product on normalized vectors.",
    "GPU indexes can accelerate search by an order of magnitude.",
]

embeddings = model.encode(documents, converttonumpy=True)

FAISS Tutorial: Efficient Vector Similarity Search at Scale

FAISS: Efficient Vector Similarity Search at Scale

Who this tutorial is for

What FAISS is and what it is not

When to use FAISS vs a managed vector database

Installation

GPU build (requires a CUDA-capable GPU and matching CUDA runtime)

Creating embeddings

Related Articles

Semantic Search Engine from Scratch Tutorial: Embeddings and Vector Search

BERTopic Tutorial: Modern Topic Modeling with Embeddings

Sentence Transformers Tutorial: Embeddings, Similarity, and Rerankers

Milvus Tutorial: Distributed Vector Database for AI

Related Articles

Semantic Search Engine from Scratch Tutorial: Embeddings and Vector Search

Membangun Mesin Pencari Semantik dari Nol Daftar Isi Pendahuluan Prasyarat Memahami Pencarian Semantik [Text Embedding.....

BERTopic Tutorial: Modern Topic Modeling with Embeddings

BERTopic: Pemodelan Topik Modern dengan Embedding BERTopic adalah library pemodelan topik yang menggabungkan embedding t...

Sentence Transformers Tutorial: Embeddings, Similarity, and Rerankers

Sentence Transformers: Embedding, Kemiripan Semantik, dan Reranker Sentence Transformers (sering disebut SBERT) adalah p...

Milvus Tutorial: Distributed Vector Database for AI

Tutorial 10: Milvus - Database Vektor Terdistribusi untuk AI Daftar Isi Pendahuluan Prasyarat Arsitektur Milvus [Instala...