MLflow vs Neptune.ai: Panduan Lengkap Experiment Tracking untuk MLOps
Experiment tracking adalah komponen krusial dalam MLOps yang memungkinkan tim data science untuk melacak, membandingkan, dan mereproduksi eksperimen machine learning. Dalam tutorial ini, kita akan membandingkan dua platform populer: MLflow (open-source) dan Neptune.ai (managed service), serta mempelajari cara menggunakan keduanya.
Mengapa Experiment Tracking Penting?
Tanpa experiment tracking yang proper, tim ML sering menghadapi:
- Reproducibility crisis: Tidak bisa mereproduksi hasil eksperimen sebelumnya
- Lost experiments: Kehilangan konfigurasi yang menghasilkan model terbaik
- Collaboration issues: Sulit berbagi hasil antar tim
- Technical debt: Spreadsheet dan catatan manual yang tidak scalable
Overview: MLflow vs Neptune.ai
| Aspek | MLflow | Neptune.ai |
|-------|--------|------------|
| Type | Open-source | Managed SaaS |
| Hosting | Self-hosted / Managed | Cloud-hosted |
| Pricing | Free (infra cost) | Free tier + paid plans |
| Setup | Manual setup | Instant |
| UI | Basic | Advanced |
| Collaboration | Limited | Built-in |
| Integrations | 15+ frameworks | 25+ frameworks |
| Model Registry | Yes | Yes |
| Best For | Full control, on-prem | Quick start, teams |
Bagian 1: MLflow
1.1 Instalasi MLflow
# Install MLflow
pip install mlflow
Untuk tracking server dengan database backend
pip install mlflow[extras]
Start tracking server (local)
mlflow ui --port 5000
Atau dengan backend store
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 0.0.0.0 \
--port 5000
1.2 Basic Experiment Tracking
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import loadiris
from sklearn.modelselection import traintestsplit
from sklearn.metrics import accuracyscore, f1score
Set tracking URI (optional, default: ./mlruns)
mlflow.settrackinguri("http://localhost:5000")
Set experiment name
mlflow.setexperiment("iris-classification")
Load data
X, y = loadiris(returnXy=True)
Xtrain, Xtest, ytrain, ytest = traintestsplit(X, y, testsize=0.2)
Start run
with mlflow.startrun(runname="random-forest-v1"):
# Log parameters
params = {
"nestimators": 100,
"maxdepth": 5,
"randomstate": 42
}
mlflow.logparams(params)
# Train model
model = RandomForestClassifier(*params)
model.fit(Xtrain, ytrain)
# Predict and evaluate
ypred = model.predict(Xtest)
accuracy = accuracyscore(ytest, ypred)
f1 = f1score(ytest, ypred, average='weighted')
# Log metrics
mlflow.logmetrics({
"accuracy": accuracy,
"f1score": f1
})
# Log model
mlflow.sklearn.logmodel(model, "model")
# Log artifacts (additional files)
with open("featureimportance.txt", "w") as f:
for name, importance in zip(loadiris().featurenames, model.featureimportances):
f.write(f"{name}: {importance:.4f}\n")
mlflow.logartifact("featureimportance.txt")
print(f"Run ID: {mlflow.activerun().info.runid}")
print(f"Accuracy: {accuracy:.4f}")
1.3 Hyperparameter Tuning dengan MLflow
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.modelselection import crossvalscore
from sklearn.datasets import loadiris
import itertools
mlflow.setexperiment("iris-hyperparameter-tuning")
X, y = loadiris(returnXy=True)
Hyperparameter grid
paramgrid = {
"nestimators": [50, 100, 200],
"maxdepth": [3, 5, 10, None],