Tutorial AWS SageMaker Pipelines: ML Pipeline Automation

# Tutorial Lengkap AWS SageMaker Pipelines: Automasi ML Workflows SageMaker Pipelines adalah layanan CI/CD yang dibuat khusus untuk machine learning yang membantu Anda mengotomasi dan mengelola workf...

By Ruby Abdullah · · tutorial
AWSSageMakerPipelinesMLOpsAutomationCI/CD

Tutorial Lengkap AWS SageMaker Pipelines: Automasi ML Workflows

SageMaker Pipelines adalah layanan CI/CD yang dibuat khusus untuk machine learning yang membantu Anda mengotomasi dan mengelola workflow ML. Layanan ini memungkinkan Anda membuat pipeline ML yang reproducible dan production-ready dengan kode minimal.

Mengapa SageMaker Pipelines?

Manfaat Utama:
  • Automasi: Otomasi workflow ML end-to-end
  • Reprodusibilitas: Lacak dan reproduksi eksperimen
  • Integrasi: Integrasi native dengan layanan SageMaker
  • Visualisasi: Visualisasi DAG di Studio
  • Version control: Versioning pipeline dan lineage

Komponen Pipeline:
  • Processing Steps
  • Training Steps
  • Transform Steps
  • Model Steps
  • Condition Steps
  • Callback Steps

Prerequisites

pip install sagemaker boto3 pandas scikit-learn

Pastikan SageMaker SDK >= 2.0

python -c "import sagemaker; print(sagemaker.version)"

Quick Start

1. Setup

import boto3

import sagemaker

from sagemaker.workflow.pipeline import Pipeline

from sagemaker.workflow.steps import ProcessingStep, TrainingStep

from sagemaker.workflow.parameters import ParameterString, ParameterInteger

session = sagemaker.Session()

bucket = session.defaultbucket()

role = sagemaker.getexecutionrole()

region = session.botoregionname

pipelinename = "iris-ml-pipeline"

2. Definisikan Parameters

from sagemaker.workflow.parameters import (

ParameterString,

ParameterInteger,

ParameterFloat

)

Parameter pipeline

inputdata = ParameterString(

name="InputData",

defaultvalue=f"s3://{bucket}/iris/raw/data.csv"

)

traininginstancetype = ParameterString(

name="TrainingInstanceType",

defaultvalue="ml.m5.xlarge"

)

traininginstancecount = ParameterInteger(

name="TrainingInstanceCount",

defaultvalue=1

)

modelapprovalstatus = ParameterString(

name="ModelApprovalStatus",

defaultvalue="PendingManualApproval"

)

Processing Steps

1. Preprocessing Data

# preprocess.py

import argparse

import os

import pandas as pd

from sklearn.modelselection import traintestsplit

from sklearn.preprocessing import StandardScaler

if name == "main":

parser = argparse.ArgumentParser()

parser.addargument("--input-data", type=str)

parser.addargument("--test-size", type=float, default=0.2)

args = parser.parseargs()

# Baca data

inputpath = os.path.join("/opt/ml/processing/input", "data.csv")

df = pd.readcsv(inputpath)

# Pisahkan fitur dan target

X = df.drop("target", axis=1)

y = df["target"]

# Scale fitur

scaler = StandardScaler()

Xscaled = scaler.fittransform(X)

# Split data

Xtrain, Xtest, ytrain, ytest = traintestsplit(

Xscaled, y, testsize=args.testsize, randomstate=42

)

# Simpan output

traindf = pd.DataFrame(Xtrain)

traindf["target"] = ytrain.values

traindf.tocsv("/opt/ml/processing/train/train.csv", index=False, header=False)

testdf = pd.DataFrame(Xtest)

testdf["target"] = ytest.values

testdf.tocsv("/opt/ml/processing/test/test.csv", index=False, header=False)

print(f"Train size: {len(traindf)}, Test size: {len(testdf)}")

from sagemaker.processing import ProcessingInput, ProcessingOutput

from sagemaker.sklearn.processing import SKLearnProcessor

Buat processor

sklearnprocessor = SKLearnProcessor(

frameworkversion="1.0-1",

role=role,

instancetype="ml.m5.large",

instancecount=1,

sagemakersession=session

)

Definisikan processing step

stepprocess = ProcessingStep(

name="PreprocessData",

processor=sklearnprocessor,

inputs=[

Artikel Terkait

Tutorial Azure ML Pipelines: Automasi Pipeline ML

Tutorial Lengkap Azure ML Pipelines: CI/CD untuk Machine Learning Azure ML Pipelines memungkinkan Anda membangun workflo...

Tutorial Vertex AI Pipelines: Orkestrasi ML Pipeline

Tutorial Lengkap Vertex AI Pipelines: Orkestrasi Workflow ML Vertex AI Pipelines memungkinkan Anda mengorkestrasi workfl...

Tutorial Azure DevOps untuk MLOps: CI/CD untuk Machine Learning

Tutorial Lengkap Azure DevOps untuk MLOps: CI/CD untuk Machine Learning Azure DevOps menyediakan kemampuan CI/CD kompreh...

Tutorial AWS Step Functions untuk ML: Orchestrasi Workflow ML

Tutorial Lengkap AWS Step Functions untuk ML: Orkestrasi ML Workflows AWS Step Functions menyediakan orkestrasi workflow...