Tutorial Lengkap Azure ML Pipelines: CI/CD untuk Machine Learning
Azure ML Pipelines memungkinkan Anda membangun workflow machine learning yang reproducible dan reusable. Pipeline mengotomatisasi lifecycle ML end-to-end dari persiapan data hingga deployment model dengan version control dan kolaborasi.
Mengapa Azure ML Pipelines?
Manfaat Utama:- Reproducibility: Workflow dengan version control
- Reusability: Komponen pipeline modular
- Automation: Pipeline terjadwal dan triggered
- Collaboration: Pengembangan berbasis tim
- Integration: Azure DevOps dan GitHub Actions
- Automated model training
- Data preprocessing workflows
- Batch inference pipelines
- MLOps CI/CD
- Automasi feature engineering
Prerequisites
pip install azure-ai-ml azure-identity
Azure CLI
az login
az extension add -n ml
Quick Start
1. Koneksi ke Workspace
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
mlclient = MLClient(
credential=DefaultAzureCredential(),
subscriptionid="your-subscription-id",
resourcegroupname="my-resource-group",
workspacename="my-ml-workspace"
)
2. Simple Pipeline
from azure.ai.ml import dsl, Input, Output
from azure.ai.ml.entities import Pipeline
Definisikan pipeline
@dsl.pipeline(
compute="cpu-cluster",
description="Simple training pipeline"
)
def trainingpipeline(trainingdata):
# Step preprocessing
preprocessstep = preprocesscomponent(
inputdata=trainingdata
)
# Step training
trainstep = traincomponent(
trainingdata=preprocessstep.outputs.outputdata
)
return {
"modeloutput": trainstep.outputs.model
}
Buat pipeline
pipeline = trainingpipeline(
trainingdata=Input(type="urifile", path="azureml:my-dataset:1")
)
Submit pipeline
pipelinejob = mlclient.jobs.createorupdate(
pipeline,
experimentname="training-pipeline"
)
print(f"Pipeline disubmit: {pipelinejob.name}")
Pipeline Components
1. Buat Component dari Kode
from azure.ai.ml import command
from azure.ai.ml.entities import Component
Data preprocessing component
preprocesscomponent = command(
name="preprocessdata",
displayname="Preprocess Data",
description="Bersihkan dan siapkan data untuk training",
inputs={
"inputdata": Input(type="urifile")
},
outputs={
"outputdata": Output(type="urifolder")
},
code="./components/preprocess",
command="python preprocess.py --input ${{inputs.inputdata}} --output ${{outputs.outputdata}}",
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest"
)
Register component
preprocesscomponent = mlclient.components.createorupdate(preprocesscomponent)
print(f"Component terdaftar: {preprocesscomponent.name}")
2. Script Component
# components/preprocess/preprocess.py
import argparse
import pandas as pd
import os
def main():
parser = argparse.ArgumentParser()
parser.addargument("--input", type=str, required=True)
parser.addargument("--output", type=str, required=True)
args = parser.parseargs()
# Load data
df = pd.readcsv(args.input)
# Preprocess
df = df.dropna()
df = df.dropduplicates()
# Normalisasi kolom numerik
numericcols = df.selectdtypes(include=['number']).columns
df[numericcols] = (df[numericcols] - df[numericcols].mean()) / df[numericcols].std()
# Simpan output
os.makedirs(args.output, existok=True)
df.tocsv(os.path.join(args.output, "processeddata.csv"), index=False)
print(f"Diproses {len(df)} baris")