AWS SageMaker Feature Store Tutorial: Feature Management for ML

# Tutorial Lengkap AWS SageMaker Feature Store: Manajemen Fitur untuk ML Amazon SageMaker Feature Store adalah repositori terkelola penuh untuk menyimpan, berbagi, dan mengelola fitur ML. Layanan ini...

By Ruby Abdullah · · tutorial
AWSSageMakerFeature StoreFeature EngineeringMLOpsData Management

Complete AWS SageMaker Feature Store Tutorial: Feature Management for ML

Amazon SageMaker Feature Store is a fully managed repository for storing, sharing, and managing ML features. It provides a centralized store for features that can be used across training and inference, ensuring consistency and reusability.

Why Feature Store?

Key Benefits:
  • Consistency: Same features for training and inference
  • Reusability: Share features across teams and models
  • Versioning: Track feature changes over time
  • Low latency: Real-time feature serving
  • Offline storage: Historical data for training

Components:
  • Feature Groups
  • Online Store (real-time)
  • Offline Store (batch/training)
  • Feature Definitions

Prerequisites

pip install sagemaker boto3 pandas

SageMaker SDK >= 2.0

python -c "import sagemaker; print(sagemaker.version)"

Quick Start

1. Setup

import boto3

import sagemaker

from sagemaker.featurestore.featuregroup import FeatureGroup

import pandas as pd

import time

session = sagemaker.Session()

bucket = session.defaultbucket()

role = sagemaker.getexecutionrole()

region = session.botoregionname

featurestoresession = sagemaker.Session()

2. Prepare Data

import pandas as pd

import numpy as np

from datetime import datetime

Create sample customer data

customerdata = pd.DataFrame({

"customerid": [f"C{i:04d}" for i in range(1, 101)],

"age": np.random.randint(18, 70, 100),

"income": np.random.randint(30000, 150000, 100),

"creditscore": np.random.randint(300, 850, 100),

"accountbalance": np.random.uniform(0, 50000, 100).round(2),

"numproducts": np.random.randint(1, 5, 100),

"isactive": np.random.choice([0, 1], 100)

})

Add event time (required for Feature Store)

customerdata["eventtime"] = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")

print(customerdata.head())

Creating Feature Groups

1. Define Feature Group

from sagemaker.featurestore.featuredefinition import (

FeatureDefinition,

FeatureTypeEnum

)

Feature group name

featuregroupname = "customer-features"

Create feature group

customerfeaturegroup = FeatureGroup(

name=featuregroupname,

sagemakersession=featurestoresession

)

Load feature definitions from DataFrame

customerfeaturegroup.loadfeaturedefinitions(dataframe=customerdata)

Or define manually

featuredefinitions = [

FeatureDefinition(featurename="customerid", featuretype=FeatureTypeEnum.STRING),

FeatureDefinition(featurename="age", featuretype=FeatureTypeEnum.INTEGRAL),

FeatureDefinition(featurename="income", featuretype=FeatureTypeEnum.INTEGRAL),

FeatureDefinition(featurename="creditscore", featuretype=FeatureTypeEnum.INTEGRAL),

FeatureDefinition(featurename="accountbalance", featuretype=FeatureTypeEnum.FRACTIONAL),

FeatureDefinition(featurename="numproducts", featuretype=FeatureTypeEnum.INTEGRAL),

FeatureDefinition(featurename="isactive", featuretype=FeatureTypeEnum.INTEGRAL),

FeatureDefinition(featurename="eventtime", featuretype=FeatureTypeEnum.STRING)

]

2. Create Feature Group

# Create feature group with online and offline stores

customerfeaturegroup.create(

s3uri=f"s3://{bucket}/feature-store/",

recordidentifiername="customerid",

eventtimefeaturename="eventtime",

rolearn=role,

enableonlinestore=True,

description="Customer features for churn prediction"

)

Wait for feature group to be created

status = customerfeaturegroup.describe().get("FeatureGroupStatus")

while status == "Creating":

print(f"Status: {status}")

time.sleep(5)

Related Articles

Vertex AI Feature Store Tutorial: Centralized Feature Management

Tutorial Lengkap Vertex AI Feature Store: Manajemen Fitur Terpusat Vertex AI Feature Store adalah repositori terpusat un...

AWS SageMaker Model Monitor Tutorial: Production Model Monitoring

Tutorial Lengkap AWS SageMaker Model Monitor: Monitoring Model ML di Production Amazon SageMaker Model Monitor secara ot...

AWS SageMaker Pipelines Tutorial: ML Pipeline Automation

Tutorial Lengkap AWS SageMaker Pipelines: Automasi ML Workflows SageMaker Pipelines adalah layanan CI/CD yang dibuat khu...

Complete AWS SageMaker Tutorial: Machine Learning in the Cloud

Tutorial Lengkap AWS SageMaker: End-to-End ML Pipeline Amazon SageMaker adalah layanan machine learning terkelola penuh ...