Pendahuluan
MLflow adalah platform open-source untuk mengelola end-to-end machine learning lifecycle. Dikembangkan oleh Databricks, MLflow membantu data scientist dan ML engineer untuk tracking experiments, packaging code, managing models, dan deploying ke production.
Mengapa MLflow?- Reproducibility: Track semua experiment dengan detail
- Collaboration: Share results dengan team
- Model versioning: Kelola berbagai versi model
- Deployment ready: Deploy model dengan mudah ke berbagai platform
- Framework agnostic: Bekerja dengan TensorFlow, PyTorch, Scikit-learn, dll
Komponen Utama MLflow
MLflow terdiri dari 4 komponen utama:
Instalasi dan Setup
Instalasi Dasar
# Install MLflow
pip install mlflow
Install dengan extras untuk berbagai backend
pip install mlflow[extras]
Verify instalasi
mlflow --version
Setup Database Backend (PostgreSQL)
Untuk production, gunakan database backend:
# Install dependencies
pip install psycopg2-binary
Setup PostgreSQL (contoh menggunakan Docker)
docker run -d \
--name mlflow-db \
-e POSTGRESUSER=mlflow \
-e POSTGRESPASSWORD=mlflow \
-e POSTGRESDB=mlflow \
-p 5432:5432 \
postgres:13
Setup Artifact Store (MinIO/S3)
# Install boto3 untuk S3 compatibility
pip install boto3
Setup MinIO (S3-compatible storage)
docker run -d \
--name mlflow-minio \
-p 9000:9000 \
-p 9001:9001 \
-e MINIOROOTUSER=minioadmin \
-e MINIOROOTPASSWORD=minioadmin \
minio/minio server /data --console-address ":9001"
Jalankan MLflow Server
# Development mode (local file store)
mlflow server --host 0.0.0.0 --port 5000
Production mode (dengan database dan S3)
mlflow server \
--backend-store-uri postgresql://mlflow:mlflow@localhost:5432/mlflow \
--default-artifact-root s3://mlflow-artifacts \
--host 0.0.0.0 \
--port 5000
Setup Environment Variables
Buat file .env:
# MLflow Tracking
MLFLOWTRACKINGURI=http://localhost:5000
S3/MinIO Configuration
AWSACCESSKEYID=minioadmin
AWSSECRETACCESSKEY=minioadmin
MLFLOWS3ENDPOINTURL=http://localhost:9000
MLflow Tracking: Experiment Tracking
Basic Tracking
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import loadiris
from sklearn.modelselection import traintestsplit
from sklearn.metrics import accuracyscore, f1score
Set tracking URI
mlflow.settrackinguri("http://localhost:5000")
Set experiment
mlflow.setexperiment("iris-classification")
Load data
iris = loadiris()
Xtrain, Xtest, ytrain, ytest = traintestsplit(
iris.data, iris.target, testsize=0.2, randomstate=42
)
Start MLflow run
with mlflow.startrun(runname="random-forest-v1") as run:
# Log parameters
params = {
"nestimators": 100,
"maxdepth": 5,
"randomstate": 42
}
mlflow.logparams(params)
# Train model
model = RandomForestClassifier(params)
model.fit(Xtrain, ytrain)
# Make predictions
ypred = model.predict(Xtest)
# Log metrics
metrics = {
"accuracy": accuracyscore(ytest, ypred),
"f1score": f1score(ytest, ypred, average="weighted")
}
mlflow.logmetrics(metrics)
# Log model
mlflow.sklearn.logmodel(
model,
"model",