Semantic Search Engine from Scratch

Introduction

Prerequisites

Understanding Semantic Search

Text Embeddings with Sentence-Transformers

Vector Indexing with FAISS

Vector Indexing with Annoy

Building the Search Pipeline

Filtering and Metadata

Reranking for Improved Relevance

Hybrid Search: Combining Semantic and Keyword Search

Building the API with FastAPI

Evaluation Metrics

Best Practices

Conclusion

Introduction

Traditional keyword-based search engines match documents by exact or fuzzy word matches. Semantic search goes further by understanding the meaning behind queries and documents. When a user searches for "how to fix a broken pipe," a semantic search engine can also return results about "plumbing repair" or "pipe leak solutions" -- even if those exact words are not in the query.

This tutorial guides you through building a complete semantic search engine from scratch. You will learn how to generate text embeddings, build vector indices with FAISS and Annoy, implement filtering and reranking, combine semantic and keyword search into a hybrid system, expose everything through a FastAPI REST API, and measure search quality with standard evaluation metrics.

Prerequisites

Python 3.9 or higher
Basic understanding of machine learning concepts
Familiarity with REST APIs

pip install sentence-transformers faiss-cpu annoy numpy fastapi uvicorn rank-bm25 scikit-learn pydantic

Understanding Semantic Search

Semantic search works in three stages:

Indexing: Documents are converted into dense vector embeddings and stored in a vector index.

Querying: The user query is converted into an embedding using the same model.

Retrieval: The vector index finds the documents whose embeddings are closest to the query embedding.

The key insight is that semantically similar texts produce similar vectors, enabling meaning-based retrieval rather than keyword matching.

User Query: "affordable electric cars"
    |
    v
[Embedding Model] -> Query Vector [0.12, -0.45, 0.78, ...]
    |
    v
[Vector Index] -> Nearest Neighbor Search
    |
    v
Results:
"Budget-friendly EVs for 2025" (similarity: 0.92)
"Low-cost electric vehicles comparison" (similarity: 0.89)
"Tesla Model 3 pricing guide" (similarity: 0.84)

Text Embeddings with Sentence-Transformers

Sentence-Transformers is a Python library that provides pre-trained models for generating high-quality text embeddings.

Loading and Using Embedding Models

from sentencetransformers import SentenceTransformer
import numpy as np

Load a pre-trained model
'all-MiniLM-L6-v2' is a good balance of speed and quality
model = SentenceTransformer('all-MiniLM-L6-v2')

Generate embeddings for single texts
text = "Machine learning is a subset of artificial intelligence."
embedding = model.encode(text)
print(f"Embedding shape: {embedding.shape}")  # (384,)
print(f"Embedding dtype: {embedding.dtype}")   # float32

Generate embeddings for multiple texts (batched for efficiency)
documents = [
    "Python is a versatile programming language.",
    "Deep learning uses neural networks with many layers.",
    "FastAPI is a modern web framework for Python.",
    "Natural language processing deals with text understanding.",
    "Docker containers simplify application deployment.",
]

docembeddings = model.encode(documents, showprogressbar=True, batchsize=32)

print(f"Batch embeddings shape: {docembeddings.shape}")  # (5, 384)

Semantic Search Engine from Scratch Tutorial: Embeddings and Vector Search

Semantic Search Engine from Scratch

Table of Contents

Introduction

Prerequisites

Understanding Semantic Search

Text Embeddings with Sentence-Transformers

Loading and Using Embedding Models

Load a pre-trained model

'all-MiniLM-L6-v2' is a good balance of speed and quality

Generate embeddings for single texts

Generate embeddings for multiple texts (batched for efficiency)

Related Articles

Sentence Transformers Tutorial: Embeddings, Similarity, and Rerankers

FAISS Tutorial: Efficient Vector Similarity Search at Scale

Reflex Tutorial: Building Full-Stack Web Apps in Pure Python

BERTopic Tutorial: Modern Topic Modeling with Embeddings

Related Articles

Sentence Transformers Tutorial: Embeddings, Similarity, and Rerankers

Sentence Transformers: Embedding, Kemiripan Semantik, dan Reranker Sentence Transformers (sering disebut SBERT) adalah p...

FAISS Tutorial: Efficient Vector Similarity Search at Scale

FAISS: Pencarian Kemiripan Vektor yang Efisien dalam Skala Besar FAISS (Facebook AI Similarity Search) adalah library C+...

Reflex Tutorial: Building Full-Stack Web Apps in Pure Python

Reflex: Membangun Aplikasi Web Full-Stack dengan Python Murni Reflex memungkinkan Anda membangun aplikasi web lengkap — ...

BERTopic Tutorial: Modern Topic Modeling with Embeddings

BERTopic: Pemodelan Topik Modern dengan Embedding BERTopic adalah library pemodelan topik yang menggabungkan embedding t...