Complete Vertex AI Feature Store Tutorial: Centralized Feature Management
Vertex AI Feature Store is a centralized repository for organizing, storing, and serving ML features. It enables feature reuse, reduces training-serving skew, and provides consistent feature access across teams.
Why Feature Store?
Key Benefits:- Centralized features: Single source of truth
- Feature reuse: Share features across models
- Low latency serving: Fast online feature retrieval
- Consistency: Same features for training and serving
- Time travel: Point-in-time feature lookups
Prerequisites
pip install google-cloud-aiplatform
gcloud auth login
gcloud config set project your-project-id
Setup
1. Initialize Vertex AI
from google.cloud import aiplatform
aiplatform.init(project="your-project-id", location="us-central1")
2. Create Feature Store
# Create feature store
featurestore = aiplatform.Featurestore.create(
featurestoreid="myfeaturestore",
onlinestorefixednodecount=1
)
print(f"Feature store created: {featurestore.resourcename}")
Entity Types
1. Create Entity Type
# Create customer entity type
customerentity = featurestore.createentitytype(
entitytypeid="customer",
description="Customer entity for churn prediction"
)
Create product entity type
productentity = featurestore.createentitytype(
entitytypeid="product",
description="Product entity for recommendation"
)
2. List Entity Types
entitytypes = featurestore.listentitytypes()
for et in entitytypes:
print(f"{et.entitytypeid}: {et.description}")
Features
1. Create Features
# Create features for customer entity
customerentity.createfeature(
featureid="age",
valuetype="INT64",
description="Customer age"
)
customerentity.createfeature(
featureid="tenuremonths",
valuetype="INT64",
description="Months as customer"
)
customerentity.createfeature(
featureid="monthlycharges",
valuetype="DOUBLE",
description="Monthly charges"
)
customerentity.createfeature(
featureid="totalcharges",
valuetype="DOUBLE",
description="Total charges to date"
)
customerentity.createfeature(
featureid="contracttype",
valuetype="STRING",
description="Type of contract"
)
2. Batch Create Features
# Create multiple features at once
featuresconfig = {
"age": {"valuetype": "INT64", "description": "Customer age"},
"tenuremonths": {"valuetype": "INT64", "description": "Tenure in months"},
"monthlycharges": {"valuetype": "DOUBLE", "description": "Monthly charges"},
"totalcharges": {"valuetype": "DOUBLE", "description": "Total charges"},
"contracttype": {"valuetype": "STRING", "description": "Contract type"}
}
customerentity.batchcreatefeatures(featuresconfig)
Ingesting Features
1. Ingest from BigQuery
# Ingest features from BigQuery
customerentity.ingestfrombq(
featureids=["age", "tenuremonths", "monthlycharges", "totalcharges"],
featuretime="updatetime",
bqsourceuri="bq://project.dataset.customerfeatures",
entityidfield="customerid"
)
2. Ingest from DataFrame
import pandas as pd
from datetime import datetime
Create feature dataframe
df = pd.DataFrame({
"customerid": ["C001", "C002", "C003"],
"age": [25, 35, 45],
"tenuremonths": [12, 24, 36],