Tutorial Lengkap BentoML: Packaging dan Serving ML Models ke Production
BentoML adalah framework open-source untuk building, shipping, dan scaling AI applications. Dengan BentoML, Anda dapat mengubah model ML menjadi production-ready API services dengan mudah, lengkap dengan containerization, batching, dan monitoring.
Mengapa BentoML?
Tantangan dalam ML deployment:
- Packaging complexity: Bundling model dengan dependencies
- Serving infrastructure: Setup web server, API endpoints
- Performance: Batching, caching, GPU utilization
- Scalability: Horizontal scaling, load balancing
- Multi-framework: Support berbagai ML frameworks
- Unified API untuk semua ML frameworks
- Auto-generated REST/gRPC APIs
- Built-in adaptive batching
- Docker/Kubernetes deployment ready
- Model versioning dan management
Instalasi
# Install BentoML
pip install bentoml
Dengan framework-specific support
pip install "bentoml[pytorch]"
pip install "bentoml[tensorflow]"
pip install "bentoml[sklearn]"
pip install "bentoml[transformers]"
Verify installation
bentoml --version
Quick Start
1. Save Model ke BentoML
# trainandsave.py
import bentoml
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import loadiris
Train model
X, y = loadiris(returnXy=True)
model = RandomForestClassifier(nestimators=100)
model.fit(X, y)
Save model ke BentoML
savedmodel = bentoml.sklearn.savemodel(
"irisclassifier",
model,
signatures={
"predict": {"batchable": True, "batchdim": 0}
},
labels={"framework": "sklearn", "dataset": "iris"},
metadata={"accuracy": 0.97}
)
print(f"Model saved: {savedmodel}")
Output: Model(tag="irisclassifier:abc123")
2. Buat Service
# service.py
import numpy as np
import bentoml
from bentoml.io import NumpyNdarray, JSON
Load model
irismodel = bentoml.sklearn.get("irisclassifier:latest")
Create runner
irisrunner = irismodel.torunner()
Create service
svc = bentoml.Service("irisservice", runners=[irisrunner])
Define API endpoint
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
async def predict(inputarray: np.ndarray) -> np.ndarray:
return await irisrunner.predict.asyncrun(inputarray)
Alternative: JSON input/output
@svc.api(input=JSON(), output=JSON())
async def classify(inputdata: dict) -> dict:
features = np.array(inputdata["features"]).reshape(1, -1)
prediction = await irisrunner.predict.asyncrun(features)
classnames = ["setosa", "versicolor", "virginica"]
return {
"prediction": int(prediction[0]),
"classname": classnames[prediction[0]]
}
3. Run Service Locally
# Development server
bentoml serve service:svc --reload
Production server
bentoml serve service:svc --production
Specify port
bentoml serve service:svc --port 3000
4. Test Service
# testservice.py
import requests
import numpy as np
Test dengan NumpyNdarray
data = np.array([[5.1, 3.5, 1.4, 0.2]])
response = requests.post(
"http://localhost:3000/predict",
headers={"content-type": "application/json"},
json=data.tolist()
)
print(f"Prediction: {response.json()}")
Test dengan JSON
response = requests.post(
"http://localhost:3000/classify",
json={"features": [5.1, 3.5, 1.4, 0.2]}
)
print(f"Classification: {response.json()}")
Building Bentos
1. Buat bentofile.yaml
# bentofile.yaml
service: "service:svc"
labels:
owner: ml-team
project: iris-classifier
include:
- ".py"
python:
packages:
- scikit-learn
- numpy
docker:
distro: debian
pythonversion: "3.10"