Tutorial AWS SageMaker Feature Store: Manajemen Feature untuk ML

# Tutorial Lengkap AWS SageMaker Feature Store: Manajemen Fitur untuk ML Amazon SageMaker Feature Store adalah repositori terkelola penuh untuk menyimpan, berbagi, dan mengelola fitur ML. Layanan ini...

By Ruby Abdullah · · tutorial
AWSSageMakerFeature StoreFeature EngineeringMLOpsData Management

Tutorial Lengkap AWS SageMaker Feature Store: Manajemen Fitur untuk ML

Amazon SageMaker Feature Store adalah repositori terkelola penuh untuk menyimpan, berbagi, dan mengelola fitur ML. Layanan ini menyediakan penyimpanan terpusat untuk fitur yang dapat digunakan di seluruh training dan inference, memastikan konsistensi dan reusabilitas.

Mengapa Feature Store?

Manfaat Utama:
  • Konsistensi: Fitur sama untuk training dan inference
  • Reusabilitas: Berbagi fitur antar tim dan model
  • Versioning: Lacak perubahan fitur seiring waktu
  • Latensi rendah: Serving fitur real-time
  • Penyimpanan offline: Data historis untuk training

Komponen:
  • Feature Groups
  • Online Store (real-time)
  • Offline Store (batch/training)
  • Feature Definitions

Prerequisites

pip install sagemaker boto3 pandas

SageMaker SDK >= 2.0

python -c "import sagemaker; print(sagemaker.version)"

Quick Start

1. Setup

import boto3

import sagemaker

from sagemaker.featurestore.featuregroup import FeatureGroup

import pandas as pd

import time

session = sagemaker.Session()

bucket = session.defaultbucket()

role = sagemaker.getexecutionrole()

region = session.botoregionname

featurestoresession = sagemaker.Session()

2. Siapkan Data

import pandas as pd

import numpy as np

from datetime import datetime

Buat sample data pelanggan

customerdata = pd.DataFrame({

"customerid": [f"C{i:04d}" for i in range(1, 101)],

"age": np.random.randint(18, 70, 100),

"income": np.random.randint(30000, 150000, 100),

"creditscore": np.random.randint(300, 850, 100),

"accountbalance": np.random.uniform(0, 50000, 100).round(2),

"numproducts": np.random.randint(1, 5, 100),

"isactive": np.random.choice([0, 1], 100)

})

Tambahkan event time (wajib untuk Feature Store)

customerdata["eventtime"] = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")

print(customerdata.head())

Membuat Feature Groups

1. Definisikan Feature Group

from sagemaker.featurestore.featuredefinition import (

FeatureDefinition,

FeatureTypeEnum

)

Nama feature group

featuregroupname = "customer-features"

Buat feature group

customerfeaturegroup = FeatureGroup(

name=featuregroupname,

sagemakersession=featurestoresession

)

Load definisi fitur dari DataFrame

customerfeaturegroup.loadfeaturedefinitions(dataframe=customerdata)

Atau definisikan manual

featuredefinitions = [

FeatureDefinition(featurename="customerid", featuretype=FeatureTypeEnum.STRING),

FeatureDefinition(featurename="age", featuretype=FeatureTypeEnum.INTEGRAL),

FeatureDefinition(featurename="income", featuretype=FeatureTypeEnum.INTEGRAL),

FeatureDefinition(featurename="creditscore", featuretype=FeatureTypeEnum.INTEGRAL),

FeatureDefinition(featurename="accountbalance", featuretype=FeatureTypeEnum.FRACTIONAL),

FeatureDefinition(featurename="numproducts", featuretype=FeatureTypeEnum.INTEGRAL),

FeatureDefinition(featurename="isactive", featuretype=FeatureTypeEnum.INTEGRAL),

FeatureDefinition(featurename="eventtime", featuretype=FeatureTypeEnum.STRING)

]

2. Buat Feature Group

# Buat feature group dengan online dan offline stores

customerfeaturegroup.create(

s3uri=f"s3://{bucket}/feature-store/",

recordidentifiername="customerid",

eventtimefeaturename="eventtime",

rolearn=role,

enableonlinestore=True,

description="Fitur pelanggan untuk prediksi churn"

)

Tunggu feature group selesai dibuat

status = customerfeaturegroup.describe().get("FeatureGroupStatus")

while status == "Creating":

print(f"Status: {status}")

time.sleep(5)

Artikel Terkait

Tutorial Vertex AI Feature Store: Manajemen Feature Terpusat

Tutorial Lengkap Vertex AI Feature Store: Manajemen Fitur Terpusat Vertex AI Feature Store adalah repositori terpusat un...

Tutorial AWS SageMaker Model Monitor: Monitoring Model Produksi

Tutorial Lengkap AWS SageMaker Model Monitor: Monitoring Model ML di Production Amazon SageMaker Model Monitor secara ot...

Tutorial AWS SageMaker Pipelines: ML Pipeline Automation

Tutorial Lengkap AWS SageMaker Pipelines: Automasi ML Workflows SageMaker Pipelines adalah layanan CI/CD yang dibuat khu...

Tutorial Lengkap AWS SageMaker: Machine Learning di Cloud

Tutorial Lengkap AWS SageMaker: End-to-End ML Pipeline Amazon SageMaker adalah layanan machine learning terkelola penuh ...