Complete Guide to MediaPipe: Computer Vision Made Easy
MediaPipe is an open-source framework from Google for building multimodal machine learning pipelines. MediaPipe provides ready-to-use solutions for various computer vision tasks such as face detection, hand tracking, pose estimation, and much more.
In this tutorial, we'll learn MediaPipe from basics to practical implementation for various real-time applications.
Why MediaPipe?
MediaPipe Advantages:
Available Solutions:
| Solution | Description |
|----------|-------------|
| Face Detection | Detect faces in images/video |
| Face Mesh | 468 3D face landmarks |
| Hand Tracking | 21 hand landmarks |
| Pose Estimation | 33 body landmarks |
| Holistic | Combination of face, hands, and pose |
| Object Detection | General object detection |
| Image Segmentation | Selfie/background segmentation |
| Gesture Recognition | Hand gesture recognition |
Installation
Install MediaPipe
# Install with pip
pip install mediapipe
Install with OpenCV (usually included)
pip install opencv-python
For GPU version (optional)
pip install mediapipe-gpu
Specific version
pip install mediapipe==0.10.9
Verify Installation
import mediapipe as mp
import cv2
print(f"MediaPipe version: {mp.version}")
print(f"OpenCV version: {cv2.version}")
MediaPipe Structure
import mediapipe as mp
Solutions - Pre-built ML pipelines
mp.solutions.facedetection
mp.solutions.facemesh
mp.solutions.hands
mp.solutions.pose
mp.solutions.holistic
mp.solutions.objectron
mp.solutions.selfiesegmentation
Drawing utilities
mp.solutions.drawingutils
mp.solutions.drawingstyles
Face Detection
Basic Face Detection
import cv2
import mediapipe as mp
Initialize
mpfacedetection = mp.solutions.facedetection
mpdrawing = mp.solutions.drawingutils
def detectfacesimage(imagepath):
"""Detect faces in a static image."""
# Read image
image = cv2.imread(imagepath)
imagergb = cv2.cvtColor(image, cv2.COLORBGR2RGB)
# Initialize face detection
with mpfacedetection.FaceDetection(
modelselection=1, # 0: short-range (2m), 1: full-range (5m)
mindetectionconfidence=0.5
) as facedetection:
# Process image
results = facedetection.process(imagergb)
# Draw detections
if results.detections:
for detection in results.detections:
mpdrawing.drawdetection(image, detection)
# Get bounding box
bbox = detection.locationdata.relativeboundingbox
h, w, = image.shape
x = int(bbox.xmin w)
y = int(bbox.ymin h)
width = int(bbox.width w)
height = int(bbox.height h)
# Get confidence score
score = detection.score[0]
print(f"Face detected: confidence={score:.2f}, bbox=({x}, {y}, {width}, {height})")
cv2.imshow('Face Detection', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Run
detectfacesimage('photo.jpg')
Real-Time Face Detection (Webcam)
import cv2
import mediapipe as mp
def realtimefacedetection():
"""Real-time face detection from webcam."""
mpfacedetection = mp.solutions.facedetection
mpdrawing = mp.solutions.drawingutils
# Open webcam