Tutorial Lengkap AWS SageMaker Feature Store: Manajemen Fitur untuk ML
Amazon SageMaker Feature Store adalah repositori terkelola penuh untuk menyimpan, berbagi, dan mengelola fitur ML. Layanan ini menyediakan penyimpanan terpusat untuk fitur yang dapat digunakan di seluruh training dan inference, memastikan konsistensi dan reusabilitas.
Mengapa Feature Store?
Manfaat Utama:- Konsistensi: Fitur sama untuk training dan inference
- Reusabilitas: Berbagi fitur antar tim dan model
- Versioning: Lacak perubahan fitur seiring waktu
- Latensi rendah: Serving fitur real-time
- Penyimpanan offline: Data historis untuk training
- Feature Groups
- Online Store (real-time)
- Offline Store (batch/training)
- Feature Definitions
Prerequisites
pip install sagemaker boto3 pandas
SageMaker SDK >= 2.0
python -c "import sagemaker; print(sagemaker.version)"
Quick Start
1. Setup
import boto3
import sagemaker
from sagemaker.featurestore.featuregroup import FeatureGroup
import pandas as pd
import time
session = sagemaker.Session()
bucket = session.defaultbucket()
role = sagemaker.getexecutionrole()
region = session.botoregionname
featurestoresession = sagemaker.Session()
2. Siapkan Data
import pandas as pd
import numpy as np
from datetime import datetime
Buat sample data pelanggan
customerdata = pd.DataFrame({
"customerid": [f"C{i:04d}" for i in range(1, 101)],
"age": np.random.randint(18, 70, 100),
"income": np.random.randint(30000, 150000, 100),
"creditscore": np.random.randint(300, 850, 100),
"accountbalance": np.random.uniform(0, 50000, 100).round(2),
"numproducts": np.random.randint(1, 5, 100),
"isactive": np.random.choice([0, 1], 100)
})
Tambahkan event time (wajib untuk Feature Store)
customerdata["eventtime"] = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
print(customerdata.head())
Membuat Feature Groups
1. Definisikan Feature Group
from sagemaker.featurestore.featuredefinition import (
FeatureDefinition,
FeatureTypeEnum
)
Nama feature group
featuregroupname = "customer-features"
Buat feature group
customerfeaturegroup = FeatureGroup(
name=featuregroupname,
sagemakersession=featurestoresession
)
Load definisi fitur dari DataFrame
customerfeaturegroup.loadfeaturedefinitions(dataframe=customerdata)
Atau definisikan manual
featuredefinitions = [
FeatureDefinition(featurename="customerid", featuretype=FeatureTypeEnum.STRING),
FeatureDefinition(featurename="age", featuretype=FeatureTypeEnum.INTEGRAL),
FeatureDefinition(featurename="income", featuretype=FeatureTypeEnum.INTEGRAL),
FeatureDefinition(featurename="creditscore", featuretype=FeatureTypeEnum.INTEGRAL),
FeatureDefinition(featurename="accountbalance", featuretype=FeatureTypeEnum.FRACTIONAL),
FeatureDefinition(featurename="numproducts", featuretype=FeatureTypeEnum.INTEGRAL),
FeatureDefinition(featurename="isactive", featuretype=FeatureTypeEnum.INTEGRAL),
FeatureDefinition(featurename="eventtime", featuretype=FeatureTypeEnum.STRING)
]
2. Buat Feature Group
# Buat feature group dengan online dan offline stores
customerfeaturegroup.create(
s3uri=f"s3://{bucket}/feature-store/",
recordidentifiername="customerid",
eventtimefeaturename="eventtime",
rolearn=role,
enableonlinestore=True,
description="Fitur pelanggan untuk prediksi churn"
)
Tunggu feature group selesai dibuat
status = customerfeaturegroup.describe().get("FeatureGroupStatus")
while status == "Creating":
print(f"Status: {status}")
time.sleep(5)