ClearML Tutorial: Open-Source MLOps Platform for Experiment Tracking and Pipeline Automation
ClearML is an open-source MLOps platform that provides a comprehensive solution for experiment tracking, data management, orchestration, and machine learning model deployment. With ClearML, data science teams can manage the entire ML project lifecycle from initial experimentation to production deployment within a single integrated platform.
In this tutorial, we will learn how to use ClearML from installation, basic experiment tracking, dataset management, to pipeline automation. All code examples can be directly practiced in your local environment.
Why ClearML?
Before diving into implementation, here are several reasons why ClearML deserves consideration for your MLOps workflow:
Installation
Installing ClearML SDK
Install the ClearML Python SDK using pip:
pip install clearml
For additional features, you can install with extras:
pip install clearml[s3]
pip install clearml[gs]
pip install clearml[azure]
Setting Up ClearML Server
There are two options for ClearML Server:
Option 1: ClearML Hosted (Free for individuals)Register a free account at app.clear.ml, then run:
clearml-init
Enter the credentials (API key, secret, and host) that can be obtained from Settings > Workspace in the ClearML dashboard.
Option 2: Self-Hosted with Dockermkdir -p /opt/clearml && cd /opt/clearml
curl -o docker-compose.yml https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose.yml
docker-compose up -d
After the server is running, access the dashboard at http://localhost:8080 and create new credentials.
Configuration
After running clearml-init, the configuration file will be saved at ~/clearml.conf:
api {
webserver: https://app.clear.ml
apiserver: https://api.clear.ml
filesserver: https://files.clear.ml
credentials {
"accesskey" = "YOURACCESSKEY"
"secretkey" = "YOURSECRETKEY"
}
}
Basic Usage: Experiment Tracking
Your First Experiment Tracking
The simplest way to get started with ClearML is by adding two lines of code to your training script:
from clearml import Task
task = Task.init(projectname="ClearML Tutorial", taskname="First Experiment")
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import makeclassification
from sklearn.modelselection import traintestsplit
from sklearn.metrics import accuracyscore, f1score
X, y = makeclassification(
nsamples=1000,
nfeatures=20,
ninformative=10,
nclasses=2,
randomstate=42
)
Xtrain, Xtest, ytrain, ytest = traintestsplit(X, y, testsize=0.2, randomstate=42)
params = {
"nestimators": 100,
"maxdepth": 10,
"minsamplessplit": 5,
"randomstate": 42
}
task.connect(params)
model = RandomForestClassifier(*params)
model.fit(Xtrain, ytrain)
ypred = model.predict(Xtest)
accuracy = accuracyscore(ytest, ypred)
f1 = f1score(ytest, ypred)
logger = task.getlogger()
logger.reportscalar("metrics", "accuracy", value=accuracy, iteration=1)