Tutorial Lengkap AWS SageMaker: End-to-End ML Pipeline
Amazon SageMaker adalah layanan machine learning terkelola penuh yang memungkinkan data scientist dan developer membangun, melatih, dan deploy model ML dalam skala besar. Tutorial ini mencakup siklus ML lengkap di AWS.
Mengapa AWS SageMaker?
Keunggulan SageMaker:- Fully managed: Tidak perlu mengelola infrastruktur
- End-to-end: Dukungan siklus ML lengkap
- Scalable: Latih dalam skala apapun dengan infrastruktur terkelola
- Terintegrasi: Integrasi native dengan layanan AWS
- Hemat biaya: Bayar hanya yang digunakan
- SageMaker Studio (IDE)
- SageMaker Training
- SageMaker Inference
- SageMaker Pipelines
- SageMaker Feature Store
- SageMaker Model Monitor
Prerequisites
# Install AWS CLI dan SDK
pip install boto3 sagemaker pandas scikit-learn
Konfigurasi kredensial AWS
aws configure
Masukkan: AWS Access Key ID, Secret Access Key, Region (misal: us-east-1)
Quick Start
1. Setup SageMaker Session
import boto3
import sagemaker
from sagemaker import getexecutionrole
Buat session
session = sagemaker.Session()
bucket = session.defaultbucket()
role = getexecutionrole() # Atau tentukan IAM role ARN
print(f"Bucket: {bucket}")
print(f"Role: {role}")
print(f"Region: {session.botoregionname}")
2. Siapkan Data Training
import pandas as pd
from sklearn.datasets import loadiris
from sklearn.modelselection import traintestsplit
Load sample data
iris = loadiris()
df = pd.DataFrame(iris.data, columns=iris.featurenames)
df['target'] = iris.target
Split data
traindf, testdf = traintestsplit(df, testsize=0.2, randomstate=42)
Simpan ke S3
trainpath = f"s3://{bucket}/iris/train/train.csv"
testpath = f"s3://{bucket}/iris/test/test.csv"
traindf.tocsv(trainpath, index=False)
testdf.tocsv(testpath, index=False)
print(f"Data training: {trainpath}")
print(f"Data test: {testpath}")
Algoritma Built-in
1. Training XGBoost
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput
Dapatkan container XGBoost
container = sagemaker.imageuris.retrieve(
framework="xgboost",
region=session.botoregionname,
version="1.5-1"
)
Buat estimator
xgbestimator = Estimator(
imageuri=container,
role=role,
instancecount=1,
instancetype="ml.m5.xlarge",
outputpath=f"s3://{bucket}/iris/output",
sagemakersession=session,
hyperparameters={
"objective": "multi:softmax",
"numclass": 3,
"numround": 100,
"maxdepth": 5,
"eta": 0.2
}
)
Definisikan input training
traininput = TrainingInput(
s3data=trainpath,
contenttype="csv"
)
Latih model
xgbestimator.fit({"train": traininput})
2. Linear Learner
from sagemaker import LinearLearner
Buat estimator Linear Learner
linear = LinearLearner(
role=role,
instancecount=1,
instancetype="ml.m5.large",
predictortype="multiclassclassifier",
numclasses=3,
outputpath=f"s3://{bucket}/linear/output"
)
Siapkan data dalam format RecordIO
trainrecords = linear.recordset(
traindf.drop('target', axis=1).values.astype('float32'),
traindf['target'].values.astype('float32'),
channel='train'
)
Latih
linear.fit(trainrecords)
Script Training Custom
1. Training Scikit-learn
# trainsklearn.py
import argparse
import joblib
import os
import pandas as pd
from sklearn.ensemble import RandomForestClassifier