Segment Anything (SAM): A Comprehensive Tutorial
Table of Contents
Introduction
Segment Anything Model (SAM), developed by Meta AI, is a foundational model for image segmentation. It introduces zero-shot segmentation capability, meaning it can segment any object in any image without being specifically trained on that object class. SAM was trained on the SA-1B dataset containing over 1 billion masks from 11 million images, making it one of the most versatile segmentation models available.
This tutorial provides a comprehensive guide to using SAM, from basic prompting to advanced integration, fine-tuning, and production deployment.
Prerequisites
pip install segment-anything
pip install torch torchvision
pip install opencv-python numpy matplotlib
pip install Pillow
pip install onnxruntime # For ONNX-based deployment
System requirements:
- Python 3.8 or higher
- GPU with at least 8 GB VRAM (for ViT-H model; smaller models need less)
- CUDA 11.7 or higher
Download model checkpoints:
# ViT-H (default, highest quality) - 2.4 GB
wget https://dl.fbaipublicfiles.com/segmentanything/samvith4b8939.pth
ViT-L (large) - 1.2 GB
wget https://dl.fbaipublicfiles.com/segmentanything/samvitl0b3195.pth
ViT-B (base, smallest) - 375 MB
wget https://dl.fbaipublicfiles.com/segmentanything/samvitb01ec64.pth
Understanding SAM Architecture
SAM consists of three components:
The key insight of SAM's architecture is the decoupled design: the heavy image encoder runs once, while the lightweight prompt encoder and mask decoder can run many times for different prompts on the same image.
Installation and Setup
import torch
import numpy as np
import cv2
import matplotlib.pyplot as plt
from segmentanything import sammodelregistry, SamPredictor, SamAutomaticMaskGenerator
def loadsammodel(checkpointpath, modeltype="vith", device="cuda"):
"""
Load the SAM model.
modeltype options: 'vith', 'vitl', 'vitb'
"""
sam = sammodelregistrymodeltype
sam.to(device=device)
return sam
def displaymask(mask, ax, randomcolor=False):
"""Utility function to display a segmentation mask."""
if randomcolor:
color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
else:
color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
h, w = mask.shape[-2:]
maskimage = mask.reshape(h, w, 1) color.reshape(1, 1, -1)
ax.imshow(maskimage)
def displaypoints(coords, labels, ax, markersize=375):
"""Display prompt points on the image."""
pospoints = coords[labels == 1]
negpoints = coords[labels == 0]
ax.scatter(pospoints[:, 0], pospoints[:, 1], color="green",