Tutorial Lengkap AWS SageMaker Model Monitor: Monitoring Model ML di Production
Amazon SageMaker Model Monitor secara otomatis mendeteksi masalah kualitas data, degradasi kualitas model, bias drift, dan feature attribution drift pada model ML yang di-deploy ke production. Layanan ini membantu mempertahankan performa model seiring waktu.
Mengapa Model Monitor?
Manfaat Utama:- Monitoring otomatis: Pengawasan model berkelanjutan
- Deteksi drift: Alert untuk data dan model quality drift
- Deteksi bias: Monitor metrik fairness
- Explainability: Tracking feature attribution
- Integrasi: Integrasi native dengan SageMaker
- Data Quality Monitor
- Model Quality Monitor
- Bias Drift Monitor
- Feature Attribution Drift Monitor
Prerequisites
pip install sagemaker boto3 pandas numpy
SageMaker SDK >= 2.0
python -c "import sagemaker; print(sagemaker.version)"
Quick Start
1. Setup
import boto3
import sagemaker
from sagemaker import getexecutionrole
from sagemaker.modelmonitor import (
DefaultModelMonitor,
DataCaptureConfig,
CronExpressionGenerator
)
session = sagemaker.Session()
bucket = session.defaultbucket()
role = getexecutionrole()
region = session.botoregionname
Lokasi output monitor
monitoroutput = f"s3://{bucket}/model-monitor"
2. Deploy Model dengan Data Capture
from sagemaker.model import Model
from sagemaker.predictor import Predictor
Buat model
model = Model(
imageuri=xgboostimage,
modeldata=modeldatauri,
role=role
)
Konfigurasi data capture
datacaptureconfig = DataCaptureConfig(
enablecapture=True,
samplingpercentage=100, # Capture semua request
destinations3uri=f"s3://{bucket}/data-capture",
captureoptions=["Input", "Output"],
csvcontenttypes=["text/csv"],
jsoncontenttypes=["application/json"]
)
Deploy dengan data capture
predictor = model.deploy(
initialinstancecount=1,
instancetype="ml.m5.large",
endpointname="monitored-endpoint",
datacaptureconfig=datacaptureconfig
)
print(f"Endpoint di-deploy: {predictor.endpointname}")
Data Quality Monitor
1. Buat Baseline
from sagemaker.modelmonitor import DefaultModelMonitor
from sagemaker.model
monitor.datasetformat import DatasetFormat
Buat monitor
data
qualitymonitor = DefaultModelMonitor(
role=role,
instance
count=1,
instancetype="ml.m5.xlarge",
volumesizeingb=20,
maxruntimeinseconds=3600
)
Buat baseline dari data training
dataqualitymonitor.suggestbaseline(
baselinedataset=f"s3://{bucket}/training-data/train.csv",
datasetformat=DatasetFormat.csv(header=True),
outputs3uri=f"{monitoroutput}/data-quality/baseline",
wait=True
)
print("Baseline dibuat!")
2. Lihat Statistik Baseline
import json
Dapatkan statistik baseline
baselinejob = dataqualitymonitor.latestbaseliningjob
statisticspath = f"{monitoroutput}/data-quality/baseline/statistics.json"
constraintspath = f"{monitoroutput}/data-quality/baseline/constraints.json"
Download dan lihat statistik
s3 = boto3.client("s3")
Parse S3 URI
def parses3uri(uri):
parts = uri.replace("s3://", "").split("/", 1)
return parts[0], parts[1]
bucketname, key = parses3uri(statisticspath)
response = s3.getobject(Bucket=bucketname, Key=key)
statistics = json.loads(response["Body"].read())
print("Statistik Baseline:")
for feature in statistics["features"]:
print(f" {feature['name']}: mean={feature.get('numericalstatistics', {}).get('mean', 'N/A')}")