ClearML Tutorial: Open-Source MLOps Platform for Experiment Tracking and Pipeline Automation

# Tutorial ClearML: Platform MLOps Open-Source untuk Experiment Tracking dan Pipeline Automation ClearML adalah platform MLOps open-source yang menyediakan solusi lengkap untuk experiment tracking, d...

By Ruby Abdullah · · tutorial
ClearMLMLOpsExperiment TrackingPipelinePython

ClearML Tutorial: Open-Source MLOps Platform for Experiment Tracking and Pipeline Automation

ClearML is an open-source MLOps platform that provides a comprehensive solution for experiment tracking, data management, orchestration, and machine learning model deployment. With ClearML, data science teams can manage the entire ML project lifecycle from initial experimentation to production deployment within a single integrated platform.

In this tutorial, we will learn how to use ClearML from installation, basic experiment tracking, dataset management, to pipeline automation. All code examples can be directly practiced in your local environment.

Why ClearML?

Before diving into implementation, here are several reasons why ClearML deserves consideration for your MLOps workflow:

  • Fully Open-Source: ClearML Server can be self-deployed at no licensing cost, with a hosted solution option for teams that prefer not to manage infrastructure.
  • Minimal Code Changes: Just add 2 lines of code to start tracking experiments, without needing to change your existing project structure.
  • Auto-Logging: ClearML automatically captures metrics, hyperparameters, artifacts, and even console output from popular frameworks like PyTorch, TensorFlow, scikit-learn, and XGBoost.
  • Pipeline Orchestration: Ability to create and run reproducible ML pipelines with automatic dependency management.
  • Data Management: Dataset versioning features that enable efficient data change tracking.
  • Installation

    Installing ClearML SDK

    Install the ClearML Python SDK using pip:

    pip install clearml
    

    For additional features, you can install with extras:

    pip install clearml[s3]
    

    pip install clearml[gs]

    pip install clearml[azure]

    Setting Up ClearML Server

    There are two options for ClearML Server:

    Option 1: ClearML Hosted (Free for individuals)

    Register a free account at app.clear.ml, then run:

    clearml-init
    

    Enter the credentials (API key, secret, and host) that can be obtained from Settings > Workspace in the ClearML dashboard.

    Option 2: Self-Hosted with Docker
    mkdir -p /opt/clearml && cd /opt/clearml
    
    

    curl -o docker-compose.yml https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose.yml

    docker-compose up -d

    After the server is running, access the dashboard at http://localhost:8080 and create new credentials.

    Configuration

    After running clearml-init, the configuration file will be saved at ~/clearml.conf:

    api {
    

    webserver: https://app.clear.ml

    apiserver: https://api.clear.ml

    filesserver: https://files.clear.ml

    credentials {

    "accesskey" = "YOURACCESSKEY"

    "secretkey" = "YOURSECRETKEY"

    }

    }

    Basic Usage: Experiment Tracking

    Your First Experiment Tracking

    The simplest way to get started with ClearML is by adding two lines of code to your training script:

    from clearml import Task
    
    

    task = Task.init(projectname="ClearML Tutorial", taskname="First Experiment")

    import numpy as np

    from sklearn.ensemble import RandomForestClassifier

    from sklearn.datasets import makeclassification

    from sklearn.modelselection import traintestsplit

    from sklearn.metrics import accuracyscore, f1score

    X, y = makeclassification(

    nsamples=1000,

    nfeatures=20,

    ninformative=10,

    nclasses=2,

    randomstate=42

    )

    Xtrain, Xtest, ytrain, ytest = traintestsplit(X, y, testsize=0.2, randomstate=42)

    params = {

    "nestimators": 100,

    "maxdepth": 10,

    "minsamplessplit": 5,

    "randomstate": 42

    }

    task.connect(params)

    model = RandomForestClassifier(*params)

    model.fit(Xtrain, ytrain)

    ypred = model.predict(Xtest)

    accuracy = accuracyscore(ytest, ypred)

    f1 = f1score(ytest, ypred)

    logger = task.getlogger()

    logger.reportscalar("metrics", "accuracy", value=accuracy, iteration=1)

    Related Articles

    Metaflow Tutorial: Netflix's MLOps Framework for Data Science

    Tutorial Metaflow: Framework MLOps dari Netflix untuk Data Science Metaflow adalah framework open-source yang dikembangk...

    Kedro Tutorial: Reproducible and Maintainable Data Science Pipelines

    Kedro: Pipeline Data Science yang Reproducible dan Mudah Dirawat Sebagian besar proyek data science dimulai dari satu no...

    ZenML: Modular and Cloud-Agnostic MLOps Pipeline Framework

    ZenML: Framework Pipeline MLOps yang Modular dan Cloud-Agnostic Pendahuluan Membangun model machine learning yang akurat...

    Azure MLflow Integration Tutorial: Experiment Tracking on Azure

    Tutorial Lengkap Azure MLflow Integration: Experiment Tracking dan Model Management Azure Machine Learning menyediakan i...