Sentence Transformers: Embeddings, Semantic Similarity, and Rerankers

Sentence Transformers (often called SBERT) is a Python library for turning text into dense vector embeddings that capture meaning rather than surface words. In this tutorial we build a small retrieval system over a toy corpus, measure semantic similarity, add a cross-encoder reranker, and then fine-tune our own embedding model with the modern training API. The goal is a practical, end-to-end view of how the library fits into a real search or RAG pipeline.

What Sentence Transformers Is

The sentence-transformers library wraps transformer models (BERT, RoBERTa, MPNet, and many others) and adds pooling so that an entire sentence or paragraph maps to a single fixed-length vector. Two texts with similar meaning produce vectors that are close together, which lets you compare them with cosine similarity instead of keyword matching.

The library is maintained alongside the Hugging Face ecosystem, so models load from the Hub, datasets use the datasets format, and trained models push back to the Hub with one call. It is the standard tool for building embedding-based search, clustering, deduplication, and the retrieval stage of RAG systems.

Bi-Encoders vs Cross-Encoders

There are two model families in the library, and choosing correctly is the single most important design decision.

A bi-encoder encodes each text independently into a vector. You embed your whole corpus once, store the vectors, and at query time you embed only the query and compare it against the stored vectors. This is fast and scales to millions of documents because comparison is just a dot product. The trade-off is accuracy: the model never sees the query and document together, so it can miss subtle interactions.

A cross-encoder takes a pair of texts at once (query and candidate) and outputs a single relevance score. Because the model attends across both texts jointly, it is far more accurate. The cost is that you cannot precompute anything: every query-document pair must be run through the model. Scoring a query against a million documents is not feasible.

The standard pattern combines both. The bi-encoder retrieves a few dozen candidates quickly, then the cross-encoder reranks just those candidates for precision. This is the retrieve-then-rerank pattern we build later.

Query --> [Bi-encoder] --> top 50 candidates --> [Cross-encoder] --> top 5 reranked
          (fast, approximate)                     (slow, precise)

Installation

Install the library with pip. It pulls in PyTorch, transformers, and datasets as dependencies.

pip install -U sentence-transformers

For training and evaluation you may also want a few extras. Installing accelerate enables faster and multi-GPU training, and datasets is required for the training API (it usually comes in already).

pip install -U accelerate datasets

Verify the install and check which device is available.

import torch
from sentencetransformers import SentenceTransformer


print("sentence-transformers ready")
print("CUDA available:", torch.cuda.isavailable())

Loading a Model and Encoding Text

The core class is SentenceTransformer. Pass it a model name from the Hub and it downloads and caches the weights. A good general-purpose starting model is all-MiniLM-L6-v2: small, fast, 384-dimensional, and strong on English semantic similarity.

from sentencetransformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "How do I reset my account password?",
    "Steps to recover a forgotten login",
    "The weather in Jakarta is hot today.",
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (3, 384)

Sentence Transformers Tutorial: Embeddings, Similarity, and Rerankers

Sentence Transformers: Embeddings, Semantic Similarity, and Rerankers

What Sentence Transformers Is

Bi-Encoders vs Cross-Encoders

Installation

Loading a Model and Encoding Text

Related Articles

BERTopic Tutorial: Modern Topic Modeling with Embeddings

Semantic Search Engine from Scratch Tutorial: Embeddings and Vector Search

spaCy Tutorial: Industrial-Strength NLP in Python

FAISS Tutorial: Efficient Vector Similarity Search at Scale

Related Articles

BERTopic Tutorial: Modern Topic Modeling with Embeddings

BERTopic: Pemodelan Topik Modern dengan Embedding BERTopic adalah library pemodelan topik yang menggabungkan embedding t...

Semantic Search Engine from Scratch Tutorial: Embeddings and Vector Search

Membangun Mesin Pencari Semantik dari Nol Daftar Isi Pendahuluan Prasyarat Memahami Pencarian Semantik [Text Embedding.....

spaCy Tutorial: Industrial-Strength NLP in Python

spaCy: NLP Kelas Industri di Python spaCy adalah pustaka open-source untuk pemrosesan bahasa alami (NLP) yang dirancang ...

FAISS Tutorial: Efficient Vector Similarity Search at Scale

FAISS: Pencarian Kemiripan Vektor yang Efisien dalam Skala Besar FAISS (Facebook AI Similarity Search) adalah library C+...