Complete Label Studio Tutorial: Data Labeling for Machine Learning
Label Studio is an open-source data labeling platform for machine learning. It supports labeling various data types including text, images, audio, video, and time series, making it essential for building high-quality ML training datasets.
Why Label Studio?
Label Studio Advantages:- Multi-modal: Label text, images, audio, video, HTML
- Flexible: Customizable labeling interfaces
- Collaborative: Team-based annotation workflows
- Integrations: ML backends, cloud storage, webhooks
- Open source: Self-hosted with enterprise options
- NLP annotation (NER, sentiment, classification)
- Computer vision labeling (bounding boxes, segmentation)
- Audio transcription and classification
- Multi-modal data annotation
- Active learning workflows
Installation
# Install with pip
pip install label-studio
Start Label Studio
label-studio start
Or with Docker
docker run -it -p 8080:8080 \
-v $(pwd)/mydata:/label-studio/data \
heartexlabs/label-studio:latest
Access at http://localhost:8080
Quick Start
1. Create Project
from labelstudiosdk import Client
Connect to Label Studio
ls = Client(url='http://localhost:8080', apikey='your-api-key')
Create project
project = ls.startproject(
title='Sentiment Analysis',
labelconfig='''
'''
)
print(f"Project created: {project.id}")
2. Import Data
# Import from list
tasks = [
{"text": "I love this product!"},
{"text": "This is terrible."},
{"text": "It's okay, nothing special."}
]
project.importtasks(tasks)
Import from file
project.importtasks('data.json')
Import from URL
project.importtasks([
{"image": "https://example.com/image1.jpg"},
{"image": "https://example.com/image2.jpg"}
])
3. Export Annotations
# Export all annotations
annotations = project.exporttasks()
Export in specific format
annotations = project.exporttasks(exporttype='JSON')
annotations = project.exporttasks(exporttype='CSV')
annotations = project.exporttasks(exporttype='COCO')
annotations = project.exporttasks(exporttype='YOLO')
Save to file
import json
with open('annotations.json', 'w') as f:
json.dump(annotations, f)
Label Configurations
1. Text Classification
2. Named Entity Recognition
3. Image Classification