Complete Azure MLflow Integration Tutorial: Experiment Tracking and Model Management
Azure Machine Learning provides native MLflow integration for experiment tracking, model versioning, and deployment. This tutorial covers using MLflow with Azure ML for comprehensive ML lifecycle management.
Why MLflow on Azure?
Key Benefits:- Native integration: Seamless Azure ML connectivity
- Open standard: Portable across platforms
- Unified tracking: Experiments, models, artifacts
- Easy deployment: Deploy MLflow models directly
- Collaboration: Share experiments across teams
- Tracking: Log experiments and metrics
- Projects: Package ML code
- Models: Model versioning and deployment
- Registry: Centralized model store
Prerequisites
pip install mlflow azureml-mlflow azure-ai-ml azure-identity
Azure CLI
az login
Setup
1. Connect to Azure ML
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
import mlflow
Connect to workspace
mlclient = MLClient(
credential=DefaultAzureCredential(),
subscriptionid="your-subscription-id",
resourcegroupname="my-resource-group",
workspacename="my-ml-workspace"
)
Get MLflow tracking URI
trackinguri = mlclient.workspaces.get().mlflowtrackinguri
print(f"Tracking URI: {trackinguri}")
Set tracking URI
mlflow.settrackinguri(trackinguri)
2. Configure Authentication
import os
Set Azure credentials for MLflow
os.environ["AZURETENANTID"] = "your-tenant-id"
os.environ["AZURECLIENTID"] = "your-client-id"
os.environ["AZURECLIENTSECRET"] = "your-client-secret"
Or use DefaultAzureCredential
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
Experiment Tracking
1. Create and Set Experiment
import mlflow
Set experiment
mlflow.setexperiment("my-ml-experiment")
Or create with tags
experiment = mlflow.createexperiment(
name="classification-experiment",
tags={
"team": "data-science",
"project": "customer-churn"
}
)
2. Log Parameters and Metrics
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.modelselection import traintestsplit
from sklearn.metrics import accuracyscore, f1score, precisionscore, recallscore
Start run
with mlflow.startrun(runname="random-forest-v1"):
# Log parameters
mlflow.logparam("nestimators", 100)
mlflow.logparam("maxdepth", 10)
mlflow.logparam("randomstate", 42)
# Train model
model = RandomForestClassifier(
nestimators=100,
maxdepth=10,
randomstate=42
)
model.fit(Xtrain, ytrain)
# Predictions
predictions = model.predict(Xtest)
# Log metrics
mlflow.logmetric("accuracy", accuracyscore(ytest, predictions))
mlflow.logmetric("f1score", f1score(ytest, predictions))
mlflow.logmetric("precision", precisionscore(ytest, predictions))
mlflow.logmetric("recall", recallscore(ytest, predictions))
print("Run completed")
3. Log Artifacts
import matplotlib.pyplot as plt
from sklearn.metrics import confusionmatrix, ConfusionMatrixDisplay
with mlflow.startrun():
# Train and predict
model.fit(Xtrain, ytrain)
predictions = model.predict(Xtest)
# Create confusion matrix plot
cm = confusionmatrix(ytest, predictions)
disp = ConfusionMatrixDisplay(confusionmatrix=cm)
disp.plot()
plt.savefig("confusionmatrix.png")
# Log artifact
mlflow.logartifact("confusionmatrix.png")
# Log directory of artifacts