Complete Azure ML Pipelines Tutorial: CI/CD for Machine Learning
Azure ML Pipelines enable you to build reproducible, reusable machine learning workflows. They automate the end-to-end ML lifecycle from data preparation to model deployment with version control and collaboration.
Why Azure ML Pipelines?
Key Benefits:- Reproducibility: Version-controlled workflows
- Reusability: Modular pipeline components
- Automation: Scheduled and triggered pipelines
- Collaboration: Team-based development
- Integration: Azure DevOps and GitHub Actions
- Automated model training
- Data preprocessing workflows
- Batch inference pipelines
- MLOps CI/CD
- Feature engineering automation
Prerequisites
pip install azure-ai-ml azure-identity
Azure CLI
az login
az extension add -n ml
Quick Start
1. Connect to Workspace
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
mlclient = MLClient(
credential=DefaultAzureCredential(),
subscriptionid="your-subscription-id",
resourcegroupname="my-resource-group",
workspacename="my-ml-workspace"
)
2. Simple Pipeline
from azure.ai.ml import dsl, Input, Output
from azure.ai.ml.entities import Pipeline
Define pipeline
@dsl.pipeline(
compute="cpu-cluster",
description="Simple training pipeline"
)
def trainingpipeline(trainingdata):
# Preprocessing step
preprocessstep = preprocesscomponent(
inputdata=trainingdata
)
# Training step
trainstep = traincomponent(
trainingdata=preprocessstep.outputs.outputdata
)
return {
"modeloutput": trainstep.outputs.model
}
Create pipeline
pipeline = trainingpipeline(
trainingdata=Input(type="urifile", path="azureml:my-dataset:1")
)
Submit pipeline
pipelinejob = mlclient.jobs.createorupdate(
pipeline,
experimentname="training-pipeline"
)
print(f"Pipeline submitted: {pipelinejob.name}")
Pipeline Components
1. Create Component from Code
from azure.ai.ml import command
from azure.ai.ml.entities import Component
Data preprocessing component
preprocesscomponent = command(
name="preprocessdata",
displayname="Preprocess Data",
description="Clean and prepare data for training",
inputs={
"inputdata": Input(type="urifile")
},
outputs={
"outputdata": Output(type="urifolder")
},
code="./components/preprocess",
command="python preprocess.py --input ${{inputs.inputdata}} --output ${{outputs.outputdata}}",
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest"
)
Register component
preprocesscomponent = mlclient.components.createorupdate(preprocesscomponent)
print(f"Component registered: {preprocesscomponent.name}")
2. Component Script
# components/preprocess/preprocess.py
import argparse
import pandas as pd
import os
def main():
parser = argparse.ArgumentParser()
parser.addargument("--input", type=str, required=True)
parser.addargument("--output", type=str, required=True)
args = parser.parseargs()
# Load data
df = pd.readcsv(args.input)
# Preprocess
df = df.dropna()
df = df.dropduplicates()
# Normalize numeric columns
numericcols = df.selectdtypes(include=['number']).columns
df[numericcols] = (df[numericcols] - df[numericcols].mean()) / df[numericcols].std()
# Save output
os.makedirs(args.output, existok=True)
df.tocsv(os.path.join(args.output, "processeddata.csv"), index=False)
print(f"Processed {len(df)} rows")
if name == "main":