Complete AWS SageMaker Feature Store Tutorial: Feature Management for ML
Amazon SageMaker Feature Store is a fully managed repository for storing, sharing, and managing ML features. It provides a centralized store for features that can be used across training and inference, ensuring consistency and reusability.
Why Feature Store?
Key Benefits:- Consistency: Same features for training and inference
- Reusability: Share features across teams and models
- Versioning: Track feature changes over time
- Low latency: Real-time feature serving
- Offline storage: Historical data for training
- Feature Groups
- Online Store (real-time)
- Offline Store (batch/training)
- Feature Definitions
Prerequisites
pip install sagemaker boto3 pandas
SageMaker SDK >= 2.0
python -c "import sagemaker; print(sagemaker.version)"
Quick Start
1. Setup
import boto3
import sagemaker
from sagemaker.featurestore.featuregroup import FeatureGroup
import pandas as pd
import time
session = sagemaker.Session()
bucket = session.defaultbucket()
role = sagemaker.getexecutionrole()
region = session.botoregionname
featurestoresession = sagemaker.Session()
2. Prepare Data
import pandas as pd
import numpy as np
from datetime import datetime
Create sample customer data
customerdata = pd.DataFrame({
"customerid": [f"C{i:04d}" for i in range(1, 101)],
"age": np.random.randint(18, 70, 100),
"income": np.random.randint(30000, 150000, 100),
"creditscore": np.random.randint(300, 850, 100),
"accountbalance": np.random.uniform(0, 50000, 100).round(2),
"numproducts": np.random.randint(1, 5, 100),
"isactive": np.random.choice([0, 1], 100)
})
Add event time (required for Feature Store)
customerdata["eventtime"] = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
print(customerdata.head())
Creating Feature Groups
1. Define Feature Group
from sagemaker.featurestore.featuredefinition import (
FeatureDefinition,
FeatureTypeEnum
)
Feature group name
featuregroupname = "customer-features"
Create feature group
customerfeaturegroup = FeatureGroup(
name=featuregroupname,
sagemakersession=featurestoresession
)
Load feature definitions from DataFrame
customerfeaturegroup.loadfeaturedefinitions(dataframe=customerdata)
Or define manually
featuredefinitions = [
FeatureDefinition(featurename="customerid", featuretype=FeatureTypeEnum.STRING),
FeatureDefinition(featurename="age", featuretype=FeatureTypeEnum.INTEGRAL),
FeatureDefinition(featurename="income", featuretype=FeatureTypeEnum.INTEGRAL),
FeatureDefinition(featurename="creditscore", featuretype=FeatureTypeEnum.INTEGRAL),
FeatureDefinition(featurename="accountbalance", featuretype=FeatureTypeEnum.FRACTIONAL),
FeatureDefinition(featurename="numproducts", featuretype=FeatureTypeEnum.INTEGRAL),
FeatureDefinition(featurename="isactive", featuretype=FeatureTypeEnum.INTEGRAL),
FeatureDefinition(featurename="eventtime", featuretype=FeatureTypeEnum.STRING)
]
2. Create Feature Group
# Create feature group with online and offline stores
customerfeaturegroup.create(
s3uri=f"s3://{bucket}/feature-store/",
recordidentifiername="customerid",
eventtimefeaturename="eventtime",
rolearn=role,
enableonlinestore=True,
description="Customer features for churn prediction"
)
Wait for feature group to be created
status = customerfeaturegroup.describe().get("FeatureGroupStatus")
while status == "Creating":
print(f"Status: {status}")
time.sleep(5)