ClearML Tutorial: Open-Source MLOps Platform for Experiment Tracking and Pipeline Automation

ClearML is an open-source MLOps platform that provides a comprehensive solution for experiment tracking, data management, orchestration, and machine learning model deployment. With ClearML, data science teams can manage the entire ML project lifecycle from initial experimentation to production deployment within a single integrated platform.

In this tutorial, we will learn how to use ClearML from installation, basic experiment tracking, dataset management, to pipeline automation. All code examples can be directly practiced in your local environment.

Why ClearML?

Before diving into implementation, here are several reasons why ClearML deserves consideration for your MLOps workflow:

Fully Open-Source: ClearML Server can be self-deployed at no licensing cost, with a hosted solution option for teams that prefer not to manage infrastructure.

Minimal Code Changes: Just add 2 lines of code to start tracking experiments, without needing to change your existing project structure.

Auto-Logging: ClearML automatically captures metrics, hyperparameters, artifacts, and even console output from popular frameworks like PyTorch, TensorFlow, scikit-learn, and XGBoost.

Pipeline Orchestration: Ability to create and run reproducible ML pipelines with automatic dependency management.

Data Management: Dataset versioning features that enable efficient data change tracking.

Installation

Installing ClearML SDK

Install the ClearML Python SDK using pip:

pip install clearml

For additional features, you can install with extras:

pip install clearml[s3]
pip install clearml[gs]
pip install clearml[azure]

Setting Up ClearML Server

There are two options for ClearML Server:

Option 1: ClearML Hosted (Free for individuals)

clearml-init

Enter the credentials (API key, secret, and host) that can be obtained from Settings > Workspace in the ClearML dashboard.

Option 2: Self-Hosted with Docker

mkdir -p /opt/clearml && cd /opt/clearml curl -o docker-compose.yml https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose.yml docker-compose up -d

After the server is running, access the dashboard at http://localhost:8080 and create new credentials.

Configuration

After running clearml-init, the configuration file will be saved at ~/clearml.conf:

api {
    webserver: https://app.clear.ml

    apiserver: https://api.clear.ml
    filesserver: https://files.clear.ml

    credentials {
        "accesskey" = "YOURACCESSKEY"
        "secretkey" = "YOURSECRETKEY"

    }
}

Basic Usage: Experiment Tracking

Your First Experiment Tracking

The simplest way to get started with ClearML is by adding two lines of code to your training script:

from clearml import Task

task = Task.init(projectname="ClearML Tutorial", taskname="First Experiment")


import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import makeclassification
from sklearn.modelselection import traintestsplit

from sklearn.metrics import accuracyscore, f1score


X, y = makeclassification(
    nsamples=1000,

    nfeatures=20,
    ninformative=10,

    nclasses=2,
    randomstate=42

)
Xtrain, Xtest, ytrain, ytest = traintestsplit(X, y, testsize=0.2, randomstate=42)


params = {
    "nestimators": 100,
    "maxdepth": 10,

    "minsamplessplit": 5,

    "randomstate": 42
}
task.connect(params)

model = RandomForestClassifier(*params)

model.fit(Xtrain, ytrain)

ypred = model.predict(Xtest)
accuracy = accuracyscore(ytest, ypred)

f1 = f1score(ytest, ypred)

logger = task.getlogger()

logger.reportscalar("metrics", "accuracy", value=accuracy, iteration=1)

ClearML Tutorial: Open-Source MLOps Platform for Experiment Tracking and Pipeline Automation

ClearML Tutorial: Open-Source MLOps Platform for Experiment Tracking and Pipeline Automation

Why ClearML?

Installation

Installing ClearML SDK

Setting Up ClearML Server

Configuration

Basic Usage: Experiment Tracking

Your First Experiment Tracking

Related Articles

Metaflow Tutorial: Netflix's MLOps Framework for Data Science

Kedro Tutorial: Reproducible and Maintainable Data Science Pipelines

ZenML: Modular and Cloud-Agnostic MLOps Pipeline Framework

Azure MLflow Integration Tutorial: Experiment Tracking on Azure

Related Articles

Metaflow Tutorial: Netflix's MLOps Framework for Data Science

Tutorial Metaflow: Framework MLOps dari Netflix untuk Data Science Metaflow adalah framework open-source yang dikembangk...

Kedro Tutorial: Reproducible and Maintainable Data Science Pipelines

Kedro: Pipeline Data Science yang Reproducible dan Mudah Dirawat Sebagian besar proyek data science dimulai dari satu no...

ZenML: Modular and Cloud-Agnostic MLOps Pipeline Framework

ZenML: Framework Pipeline MLOps yang Modular dan Cloud-Agnostic Pendahuluan Membangun model machine learning yang akurat...

Azure MLflow Integration Tutorial: Experiment Tracking on Azure

Tutorial Lengkap Azure MLflow Integration: Experiment Tracking dan Model Management Azure Machine Learning menyediakan i...