Tutorial Lengkap Azure MLflow Integration: Experiment Tracking dan Model Management
Azure Machine Learning menyediakan integrasi MLflow native untuk experiment tracking, model versioning, dan deployment. Tutorial ini mencakup penggunaan MLflow dengan Azure ML untuk manajemen lifecycle ML yang komprehensif.
Mengapa MLflow di Azure?
Manfaat Utama:- Integrasi native: Konektivitas seamless Azure ML
- Open standard: Portable lintas platform
- Unified tracking: Experiments, models, artifacts
- Easy deployment: Deploy MLflow models langsung
- Collaboration: Berbagi experiments antar tim
- Tracking: Log experiments dan metrics
- Projects: Package ML code
- Models: Model versioning dan deployment
- Registry: Centralized model store
Prerequisites
pip install mlflow azureml-mlflow azure-ai-ml azure-identity
Azure CLI
az login
Setup
1. Koneksi ke Azure ML
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
import mlflow
Koneksi ke workspace
mlclient = MLClient(
credential=DefaultAzureCredential(),
subscriptionid="your-subscription-id",
resourcegroupname="my-resource-group",
workspacename="my-ml-workspace"
)
Dapatkan MLflow tracking URI
trackinguri = mlclient.workspaces.get().mlflowtrackinguri
print(f"Tracking URI: {trackinguri}")
Set tracking URI
mlflow.settrackinguri(trackinguri)
2. Konfigurasi Authentication
import os
Set Azure credentials untuk MLflow
os.environ["AZURETENANTID"] = "your-tenant-id"
os.environ["AZURECLIENTID"] = "your-client-id"
os.environ["AZURECLIENTSECRET"] = "your-client-secret"
Atau gunakan DefaultAzureCredential
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
Experiment Tracking
1. Buat dan Set Experiment
import mlflow
Set experiment
mlflow.setexperiment("my-ml-experiment")
Atau buat dengan tags
experiment = mlflow.createexperiment(
name="classification-experiment",
tags={
"team": "data-science",
"project": "customer-churn"
}
)
2. Log Parameters dan Metrics
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.modelselection import traintestsplit
from sklearn.metrics import accuracyscore, f1score, precisionscore, recallscore
Mulai run
with mlflow.startrun(runname="random-forest-v1"):
# Log parameters
mlflow.logparam("nestimators", 100)
mlflow.logparam("maxdepth", 10)
mlflow.logparam("randomstate", 42)
# Train model
model = RandomForestClassifier(
nestimators=100,
maxdepth=10,
randomstate=42
)
model.fit(Xtrain, ytrain)
# Predictions
predictions = model.predict(Xtest)
# Log metrics
mlflow.logmetric("accuracy", accuracyscore(ytest, predictions))
mlflow.logmetric("f1score", f1score(ytest, predictions))
mlflow.logmetric("precision", precisionscore(ytest, predictions))
mlflow.logmetric("recall", recallscore(ytest, predictions))
print("Run selesai")
3. Log Artifacts
import matplotlib.pyplot as plt
from sklearn.metrics import confusionmatrix, ConfusionMatrixDisplay
with mlflow.startrun():
# Train dan predict
model.fit(Xtrain, ytrain)
predictions = model.predict(Xtest)
# Buat confusion matrix plot
cm = confusionmatrix(ytest, predictions)
disp = ConfusionMatrixDisplay(confusionmatrix=cm)
disp.plot()
plt.savefig("confusionmatrix.png")
# Log artifact
mlflow.logartifact("confusionmatrix.png")