Semantic Search Engine from Scratch
Table of Contents
Introduction
Traditional keyword-based search engines match documents by exact or fuzzy word matches. Semantic search goes further by understanding the meaning behind queries and documents. When a user searches for "how to fix a broken pipe," a semantic search engine can also return results about "plumbing repair" or "pipe leak solutions" -- even if those exact words are not in the query.
This tutorial guides you through building a complete semantic search engine from scratch. You will learn how to generate text embeddings, build vector indices with FAISS and Annoy, implement filtering and reranking, combine semantic and keyword search into a hybrid system, expose everything through a FastAPI REST API, and measure search quality with standard evaluation metrics.
Prerequisites
- Python 3.9 or higher
- Basic understanding of machine learning concepts
- Familiarity with REST APIs
pip install sentence-transformers faiss-cpu annoy numpy fastapi uvicorn rank-bm25 scikit-learn pydantic
Understanding Semantic Search
Semantic search works in three stages:
The key insight is that semantically similar texts produce similar vectors, enabling meaning-based retrieval rather than keyword matching.
User Query: "affordable electric cars"
|
v
[Embedding Model] -> Query Vector [0.12, -0.45, 0.78, ...]
|
v
[Vector Index] -> Nearest Neighbor Search
|
v
Results:
"Budget-friendly EVs for 2025" (similarity: 0.92)
"Low-cost electric vehicles comparison" (similarity: 0.89)
"Tesla Model 3 pricing guide" (similarity: 0.84)
Text Embeddings with Sentence-Transformers
Sentence-Transformers is a Python library that provides pre-trained models for generating high-quality text embeddings.
Loading and Using Embedding Models
from sentencetransformers import SentenceTransformer
import numpy as np
Load a pre-trained model
'all-MiniLM-L6-v2' is a good balance of speed and quality
model = SentenceTransformer('all-MiniLM-L6-v2')
Generate embeddings for single texts
text = "Machine learning is a subset of artificial intelligence."
embedding = model.encode(text)
print(f"Embedding shape: {embedding.shape}") # (384,)
print(f"Embedding dtype: {embedding.dtype}") # float32
Generate embeddings for multiple texts (batched for efficiency)
documents = [
"Python is a versatile programming language.",
"Deep learning uses neural networks with many layers.",
"FastAPI is a modern web framework for Python.",
"Natural language processing deals with text understanding.",
"Docker containers simplify application deployment.",
]
doc
embeddings = model.encode(documents, showprogressbar=True, batchsize=32)
print(f"Batch embeddings shape: {doc
embeddings.shape}") # (5, 384)