Complete Feast Tutorial: Feature Store for Machine Learning
Feast (Feature Store) is an open-source feature store that helps ML teams manage, discover, and serve features for machine learning models. It bridges the gap between data engineering and machine learning by providing a consistent way to define, store, and retrieve features.
Why Feast?
Feast Advantages:- Feature consistency: Same features in training and serving
- Feature sharing: Reuse features across teams
- Point-in-time correctness: Prevent data leakage
- Low latency serving: Online feature retrieval
- Feature discovery: Centralized feature catalog
- ML model training and serving
- Real-time recommendation systems
- Fraud detection
- Personalization engines
- Feature sharing across teams
Installation
# Basic installation
pip install feast
With specific providers
pip install feast[redis] # Redis online store
pip install feast[gcp] # Google Cloud
pip install feast[aws] # AWS
pip install feast[snowflake] # Snowflake
Verify installation
feast version
Quick Start
1. Initialize Project
# Create new Feast project
feast init myfeaturerepo
cd myfeaturerepo
Project structure:
myfeaturerepo/
├── featurerepo/
│ ├── init.py
│ ├── examplerepo.py
│ └── featurestore.yaml
└── data/
└── driverstats.parquet
2. Feature Store Configuration
# featurestore.yaml
project: my
project
provider: local
registry: data/registry.db
onlinestore:
type: sqlite
path: data/onlinestore.db
offlinestore:
type: file
entitykeyserializationversion: 2
3. Define Features
# featurerepo/features.py
from datetime import timedelta
from feast import Entity, Feature, FeatureView, FileSource, ValueType
from feast.types import Float32, Int64, String
Define data source
driver
statssource = FileSource(
name="driver
statssource",
path="data/driver
stats.parquet",
timestampfield="eventtimestamp",
createdtimestampcolumn="created",
)
Define entity
driver = Entity(
name="driverid",
valuetype=ValueType.INT64,
description="Driver identifier",
)
Define feature view
driverstatsfv = FeatureView(
name="driverstats",
entities=[driver],
ttl=timedelta(days=1),
schema=[
Feature(name="convrate", dtype=Float32),
Feature(name="accrate", dtype=Float32),
Feature(name="avgdailytrips", dtype=Int64),
],
source=driverstatssource,
online=True,
tags={"team": "driverperformance"},
)
4. Apply and Materialize
# Apply feature definitions
feast apply
Materialize features to online store
feast materialize-incremental $(date +%Y-%m-%dT%H:%M:%S)
Or materialize specific date range
feast materialize 2024-01-01T00:00:00 2024-01-31T00:00:00
5. Retrieve Features
from feast import FeatureStore
import pandas as pd
Initialize feature store
store = FeatureStore(repopath=".")
Get historical features (for training)
entitydf = pd.DataFrame({
"driverid": [1001, 1002, 1003],
"eventtimestamp": pd.todatetime([
"2024-01-15 10:00:00",
"2024-01-15 10:00:00",
"2024-01-15 10:00:00",
])
})
trainingdf = store.gethistoricalfeatures(
entitydf=entitydf,
features=[
"driverstats:convrate",
"driverstats:accrate",
"driverstats:avgdailytrips",
],
).todf()
print(trainingdf)
Get online features (for serving)
onlinefeatures = store.getonlinefeatures(
features=[