FAISS: Efficient Vector Similarity Search at Scale
FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings for indexing and searching dense vectors. When your application needs to find the nearest neighbours of an embedding among millions or billions of vectors, FAISS gives you fine-grained control over the speed, memory, and recall trade-offs that a managed vector database hides behind an API. This tutorial focuses on the library itself: its index types, how each one works internally, and how to use them in production.
Who this tutorial is for
This is not another generic "build a semantic search engine" walkthrough. We assume you already understand embeddings and cosine similarity at a high level. Instead we go deep on the FAISS index zoo: exact flat indexes, inverted-file (IVF) indexes, product quantization (PQ), HNSW graphs, ID mapping, persistence, and GPU offload. By the end you will be able to choose the right index for your dataset size and latency budget, and tune it deliberately rather than by guesswork.
What FAISS is and what it is not
FAISS solves one problem extremely well: given a query vector, return the k most similar vectors from a collection, using either L2 (Euclidean) distance or inner product. It is a library, not a service. There is no network layer, no authentication, no metadata filtering engine, and no built-in persistence beyond reading and writing a single index file.
That minimalism is the point. FAISS lets you:
- Decide exactly how vectors are stored (full precision, quantized, on disk).
- Trade recall for speed by changing a single parameter at query time.
- Run the same index on CPU or GPU with a one-line move.
- Embed the index directly inside your own process, avoiding network round-trips.
When to use FAISS vs a managed vector database
Reach for FAISS when:
- You want a single library embedded in your service with no extra infrastructure.
- You need precise control over the index structure and memory footprint.
- Your vectors are mostly static, or you rebuild the index on a schedule.
- You are doing research or batch similarity computation.
Reach for a managed vector database (Qdrant, Milvus, Weaviate, pgvector, Pinecone) when:
- You need rich metadata filtering combined with vector search.
- You require frequent inserts, updates, and deletes with durability guarantees.
- You want horizontal scaling, replication, and an HTTP/gRPC API out of the box.
- Operating a stateful service is acceptable and you would rather not build it yourself.
Many of those databases actually use FAISS-like algorithms (IVF, HNSW, PQ) under the hood, so understanding FAISS makes you better at tuning them too.
Installation
FAISS ships as two mutually exclusive packages. Install exactly one.
# CPU-only build (works everywhere, good default)
pip install faiss-cpu
GPU build (requires a CUDA-capable GPU and matching CUDA runtime)
pip install faiss-gpu
For the examples we also use sentence-transformers to produce embeddings and numpy for array handling.
pip install sentence-transformers numpy
A quick sanity check:
import faiss
import numpy as np
print("FAISS version:", faiss.version)
print("Number of GPUs visible to FAISS:", faiss.getnumgpus())
If getnumgpus() returns 0 you are on the CPU build, which is fine for everything except the GPU section near the end.
Creating embeddings
FAISS works with float32 NumPy arrays of shape (nvectors, dimension). It does not generate embeddings itself; you bring your own. Here we use a small sentence-transformer model.
from sentencetransformers import SentenceTransformer
import numpy as np
model = SentenceTransformer("all-MiniLM-L6-v2") # 384-dimensional output
documents = [
"FAISS performs nearest-neighbour search over dense vectors.",
"Product quantization compresses vectors to save memory.",
"An inverted file index partitions the vector space into cells.",
"HNSW builds a navigable small-world graph for fast search.",
"Cosine similarity is inner product on normalized vectors.",
"GPU indexes can accelerate search by an order of magnitude.",
]
embeddings = model.encode(documents, converttonumpy=True)