Detectron2 - Object Detection and Segmentation
Table of Contents
Introduction
Detectron2 is Facebook AI Research's (FAIR) next-generation library for object detection, instance segmentation, and other visual recognition tasks. Built on top of PyTorch, it provides a modular and extensible framework that powers many state-of-the-art computer vision models.
Key features of Detectron2:
- Modular design: Easily swap backbones, heads, and other components
- Rich Model Zoo: Pre-trained models for detection, segmentation, keypoint detection, and more
- High performance: Optimized training and inference with multi-GPU support
- Flexible config system: YAML-based configuration with full Python override capability
- Production-ready: Export to TorchScript, ONNX, and Caffe2 for deployment
In this tutorial, you will learn to use Detectron2 for object detection and segmentation tasks, train on custom datasets, navigate the config system, and deploy models to production.
Prerequisites
- Python 3.7 or higher
- PyTorch 1.9+ with CUDA support (GPU strongly recommended)
- Basic understanding of object detection and segmentation concepts
- Familiarity with COCO dataset format
- Linux operating system (recommended; macOS and Windows have limited support)
Installation
# Install PyTorch first (check pytorch.org for your CUDA version)
pip install torch torchvision
Install Detectron2
pip install 'git+https://github.com/facebookresearch/detectron2.git'
Alternative: install from pre-built wheels (faster)
Check https://detectron2.readthedocs.io/en/latest/tutorials/install.html
pip install detectron2 -f \
https://dl.fbaipublicfiles.com/detectron2/wheels/cu118/torch2.0/index.html
Additional dependencies
pip install opencv-python pillow matplotlib
Verify the installation:
import detectron2
from detectron2.utils.logger import setuplogger
setuplogger()
print(f"Detectron2 version: {detectron2.version}")
import torch
print(f"PyTorch version: {torch.version}")
print(f"CUDA available: {torch.cuda.isavailable()}")
print(f"CUDA version: {torch.version.cuda}")
Understanding Detectron2 Architecture
Detectron2 follows a modular architecture with these key components:
Input Image
|
v
[Backbone] --> Feature Maps (FPN)
|
v
[Region Proposal Network (RPN)] --> Proposals
|
v
[ROI Heads] --> Detections, Masks, Keypoints
|
v
[Post-processing] --> Final Predictions
Core abstractions:
from detectron2.modeling import buildmodel
from detectron2.config import getcfg
The config defines the entire model architecture
cfg = getcfg()
Key components:
- cfg.MODEL.BACKBONE: Feature extractor (ResNet, ResNeXt, etc.)
- cfg.MODEL.FPN: Feature Pyramid Network settings
- cfg.MODEL.RPN: Region Proposal Network
- cfg.MODEL.ROIHEADS: Detection/segmentation heads
- cfg.MODEL.PIXELMEAN/STD: Input normalization
Using Pre-trained Models
The fastest way to start is with pre-trained models from Model Zoo.
import cv2
import numpy as np