Tutorial Lengkap Feast: Feature Store untuk Machine Learning
Feast (Feature Store) adalah feature store open-source yang membantu tim ML mengelola, menemukan, dan menyajikan features untuk model machine learning. Feast menjembatani gap antara data engineering dan machine learning dengan menyediakan cara konsisten untuk mendefinisikan, menyimpan, dan mengambil features.
Mengapa Feast?
Keunggulan Feast:- Feature consistency: Features sama di training dan serving
- Feature sharing: Reuse features antar tim
- Point-in-time correctness: Cegah data leakage
- Low latency serving: Online feature retrieval
- Feature discovery: Centralized feature catalog
- ML model training dan serving
- Real-time recommendation systems
- Fraud detection
- Personalization engines
- Feature sharing antar tim
Instalasi
# Basic installation
pip install feast
Dengan specific providers
pip install feast[redis] # Redis online store
pip install feast[gcp] # Google Cloud
pip install feast[aws] # AWS
pip install feast[snowflake] # Snowflake
Verify installation
feast version
Quick Start
1. Initialize Project
# Buat Feast project baru
feast init myfeaturerepo
cd myfeaturerepo
Project structure:
myfeaturerepo/
├── featurerepo/
│ ├── init.py
│ ├── examplerepo.py
│ └── featurestore.yaml
└── data/
└── driverstats.parquet
2. Feature Store Configuration
# featurestore.yaml
project: my
project
provider: local
registry: data/registry.db
onlinestore:
type: sqlite
path: data/onlinestore.db
offlinestore:
type: file
entitykeyserializationversion: 2
3. Define Features
# featurerepo/features.py
from datetime import timedelta
from feast import Entity, Feature, FeatureView, FileSource, ValueType
from feast.types import Float32, Int64, String
Define data source
driver
statssource = FileSource(
name="driver
statssource",
path="data/driver
stats.parquet",
timestampfield="eventtimestamp",
createdtimestampcolumn="created",
)
Define entity
driver = Entity(
name="driverid",
valuetype=ValueType.INT64,
description="Driver identifier",
)
Define feature view
driverstatsfv = FeatureView(
name="driverstats",
entities=[driver],
ttl=timedelta(days=1),
schema=[
Feature(name="convrate", dtype=Float32),
Feature(name="accrate", dtype=Float32),
Feature(name="avgdailytrips", dtype=Int64),
],
source=driverstatssource,
online=True,
tags={"team": "driverperformance"},
)
4. Apply dan Materialize
# Apply feature definitions
feast apply
Materialize features ke online store
feast materialize-incremental $(date +%Y-%m-%dT%H:%M:%S)
Atau materialize date range tertentu
feast materialize 2024-01-01T00:00:00 2024-01-31T00:00:00
5. Retrieve Features
from feast import FeatureStore
import pandas as pd
Initialize feature store
store = FeatureStore(repopath=".")
Get historical features (untuk training)
entitydf = pd.DataFrame({
"driverid": [1001, 1002, 1003],
"eventtimestamp": pd.todatetime([
"2024-01-15 10:00:00",
"2024-01-15 10:00:00",
"2024-01-15 10:00:00",
])
})
trainingdf = store.gethistoricalfeatures(
entitydf=entitydf,
features=[
"driverstats:convrate",
"driverstats:accrate",
"driverstats:avgdailytrips",
],
).todf()
print(trainingdf)