Complete Label Studio Tutorial: Data Labeling for Machine Learning

# Tutorial Lengkap Label Studio: Data Labeling untuk Machine Learning Label Studio adalah platform data labeling open-source untuk machine learning. Platform ini mendukung labeling berbagai tipe data...

By Ruby Abdullah · · tutorial
Label StudioData LabelingAnnotationMLOpsPythonComputer VisionNLP

Complete Label Studio Tutorial: Data Labeling for Machine Learning

Label Studio is an open-source data labeling platform for machine learning. It supports labeling various data types including text, images, audio, video, and time series, making it essential for building high-quality ML training datasets.

Why Label Studio?

Label Studio Advantages:
  • Multi-modal: Label text, images, audio, video, HTML
  • Flexible: Customizable labeling interfaces
  • Collaborative: Team-based annotation workflows
  • Integrations: ML backends, cloud storage, webhooks
  • Open source: Self-hosted with enterprise options

Use Cases:
  • NLP annotation (NER, sentiment, classification)
  • Computer vision labeling (bounding boxes, segmentation)
  • Audio transcription and classification
  • Multi-modal data annotation
  • Active learning workflows

Installation

# Install with pip

pip install label-studio

Start Label Studio

label-studio start

Or with Docker

docker run -it -p 8080:8080 \

-v $(pwd)/mydata:/label-studio/data \

heartexlabs/label-studio:latest

Access at http://localhost:8080

Quick Start

1. Create Project

from labelstudiosdk import Client

Connect to Label Studio

ls = Client(url='http://localhost:8080', apikey='your-api-key')

Create project

project = ls.startproject(

title='Sentiment Analysis',

labelconfig='''

'''

)

print(f"Project created: {project.id}")

2. Import Data

# Import from list

tasks = [

{"text": "I love this product!"},

{"text": "This is terrible."},

{"text": "It's okay, nothing special."}

]

project.importtasks(tasks)

Import from file

project.importtasks('data.json')

Import from URL

project.importtasks([

{"image": "https://example.com/image1.jpg"},

{"image": "https://example.com/image2.jpg"}

])

3. Export Annotations

# Export all annotations

annotations = project.exporttasks()

Export in specific format

annotations = project.exporttasks(exporttype='JSON')

annotations = project.exporttasks(exporttype='CSV')

annotations = project.exporttasks(exporttype='COCO')

annotations = project.exporttasks(exporttype='YOLO')

Save to file

import json

with open('annotations.json', 'w') as f:

json.dump(annotations, f)

Label Configurations

1. Text Classification


2. Named Entity Recognition


3. Image Classification


Related Articles

Kedro Tutorial: Reproducible and Maintainable Data Science Pipelines

Kedro: Pipeline Data Science yang Reproducible dan Mudah Dirawat Sebagian besar proyek data science dimulai dari satu no...

BERTopic Tutorial: Modern Topic Modeling with Embeddings

BERTopic: Pemodelan Topik Modern dengan Embedding BERTopic adalah library pemodelan topik yang menggabungkan embedding t...

Ray Train & Ray Tune Tutorial: Distributed Training and Hyperparameter Tuning

Ray Train & Ray Tune: Pelatihan Terdistribusi dan Penyetelan Hiperparameter Sebagian besar proyek machine learning dimul...

Sentence Transformers Tutorial: Embeddings, Similarity, and Rerankers

Sentence Transformers: Embedding, Kemiripan Semantik, dan Reranker Sentence Transformers (sering disebut SBERT) adalah p...