Segment Anything (SAM) Tutorial: Universal Image Segmentation

# Segment Anything (SAM): Tutorial Komprehensif ## Daftar Isi 1. [Pendahuluan](#pendahuluan) 2. [Prasyarat](#prasyarat) 3. [Memahami Arsitektur SAM](#memahami-arsitektur-sam) 4. [Instalasi dan Setup...

By Ruby Abdullah · · tutorial
SAMSegment AnythingImage SegmentationComputer VisionMeta AIZero-Shot

Segment Anything (SAM): A Comprehensive Tutorial

Table of Contents

  • Introduction
  • Prerequisites
  • Understanding SAM Architecture
  • Installation and Setup
  • Point-based Prompting
  • Box-based Prompting
  • Text-based Prompting with Grounding
  • Automatic Mask Generation
  • Integration with Other Models
  • Fine-tuning SAM
  • Deployment
  • Best Practices
  • Conclusion

  • Introduction

    Segment Anything Model (SAM), developed by Meta AI, is a foundational model for image segmentation. It introduces zero-shot segmentation capability, meaning it can segment any object in any image without being specifically trained on that object class. SAM was trained on the SA-1B dataset containing over 1 billion masks from 11 million images, making it one of the most versatile segmentation models available.

    This tutorial provides a comprehensive guide to using SAM, from basic prompting to advanced integration, fine-tuning, and production deployment.


    Prerequisites

    pip install segment-anything
    

    pip install torch torchvision

    pip install opencv-python numpy matplotlib

    pip install Pillow

    pip install onnxruntime # For ONNX-based deployment

    System requirements:
    • Python 3.8 or higher
    • GPU with at least 8 GB VRAM (for ViT-H model; smaller models need less)
    • CUDA 11.7 or higher

    Download model checkpoints:

    # ViT-H (default, highest quality) - 2.4 GB
    

    wget https://dl.fbaipublicfiles.com/segmentanything/samvith4b8939.pth

    ViT-L (large) - 1.2 GB

    wget https://dl.fbaipublicfiles.com/segmentanything/samvitl0b3195.pth

    ViT-B (base, smallest) - 375 MB

    wget https://dl.fbaipublicfiles.com/segmentanything/samvitb01ec64.pth


    Understanding SAM Architecture

    SAM consists of three components:

  • Image Encoder: A Vision Transformer (ViT) that produces image embeddings. The image is processed once, and the embeddings are reused for all prompts. This is the most computationally expensive step.
  • Prompt Encoder: Encodes the user-provided prompts (points, boxes, masks, or text). Points and boxes are represented as positional encodings; masks are encoded using convolutions.
  • Mask Decoder: A lightweight transformer decoder that combines image embeddings with prompt embeddings to produce segmentation masks. It outputs three masks with confidence scores to handle ambiguity.
  • The key insight of SAM's architecture is the decoupled design: the heavy image encoder runs once, while the lightweight prompt encoder and mask decoder can run many times for different prompts on the same image.


    Installation and Setup

    import torch
    

    import numpy as np

    import cv2

    import matplotlib.pyplot as plt

    from segmentanything import sammodelregistry, SamPredictor, SamAutomaticMaskGenerator

    def loadsammodel(checkpointpath, modeltype="vith", device="cuda"):

    """

    Load the SAM model.

    modeltype options: 'vith', 'vitl', 'vitb'

    """

    sam = sammodelregistrymodeltype

    sam.to(device=device)

    return sam

    def displaymask(mask, ax, randomcolor=False):

    """Utility function to display a segmentation mask."""

    if randomcolor:

    color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)

    else:

    color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])

    h, w = mask.shape[-2:]

    maskimage = mask.reshape(h, w, 1) color.reshape(1, 1, -1)

    ax.imshow(maskimage)

    def displaypoints(coords, labels, ax, markersize=375):

    """Display prompt points on the image."""

    pospoints = coords[labels == 1]

    negpoints = coords[labels == 0]

    ax.scatter(pospoints[:, 0], pospoints[:, 1], color="green",

    Related Articles