Tutorial Lengkap BentoML: Packaging dan Serving ML Models ke Production

# Tutorial Lengkap BentoML: Packaging dan Serving ML Models ke Production BentoML adalah framework open-source untuk building, shipping, dan scaling AI applications. Dengan BentoML, Anda dapat mengub...

By Ruby Abdullah · · tutorial
BentoMLMLOpsModel ServingDockerKubernetesMachine Learning

Tutorial Lengkap BentoML: Packaging dan Serving ML Models ke Production

BentoML adalah framework open-source untuk building, shipping, dan scaling AI applications. Dengan BentoML, Anda dapat mengubah model ML menjadi production-ready API services dengan mudah, lengkap dengan containerization, batching, dan monitoring.

Mengapa BentoML?

Tantangan dalam ML deployment:

  • Packaging complexity: Bundling model dengan dependencies
  • Serving infrastructure: Setup web server, API endpoints
  • Performance: Batching, caching, GPU utilization
  • Scalability: Horizontal scaling, load balancing
  • Multi-framework: Support berbagai ML frameworks

BentoML Solutions:
  • Unified API untuk semua ML frameworks
  • Auto-generated REST/gRPC APIs
  • Built-in adaptive batching
  • Docker/Kubernetes deployment ready
  • Model versioning dan management

Instalasi

# Install BentoML

pip install bentoml

Dengan framework-specific support

pip install "bentoml[pytorch]"

pip install "bentoml[tensorflow]"

pip install "bentoml[sklearn]"

pip install "bentoml[transformers]"

Verify installation

bentoml --version

Quick Start

1. Save Model ke BentoML

# trainandsave.py

import bentoml

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import loadiris

Train model

X, y = loadiris(returnXy=True)

model = RandomForestClassifier(nestimators=100)

model.fit(X, y)

Save model ke BentoML

savedmodel = bentoml.sklearn.savemodel(

"irisclassifier",

model,

signatures={

"predict": {"batchable": True, "batchdim": 0}

},

labels={"framework": "sklearn", "dataset": "iris"},

metadata={"accuracy": 0.97}

)

print(f"Model saved: {savedmodel}")

Output: Model(tag="irisclassifier:abc123")

2. Buat Service

# service.py

import numpy as np

import bentoml

from bentoml.io import NumpyNdarray, JSON

Load model

irismodel = bentoml.sklearn.get("irisclassifier:latest")

Create runner

irisrunner = irismodel.torunner()

Create service

svc = bentoml.Service("irisservice", runners=[irisrunner])

Define API endpoint

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())

async def predict(inputarray: np.ndarray) -> np.ndarray:

return await irisrunner.predict.asyncrun(inputarray)

Alternative: JSON input/output

@svc.api(input=JSON(), output=JSON())

async def classify(inputdata: dict) -> dict:

features = np.array(inputdata["features"]).reshape(1, -1)

prediction = await irisrunner.predict.asyncrun(features)

classnames = ["setosa", "versicolor", "virginica"]

return {

"prediction": int(prediction[0]),

"classname": classnames[prediction[0]]

}

3. Run Service Locally

# Development server

bentoml serve service:svc --reload

Production server

bentoml serve service:svc --production

Specify port

bentoml serve service:svc --port 3000

4. Test Service

# testservice.py

import requests

import numpy as np

Test dengan NumpyNdarray

data = np.array([[5.1, 3.5, 1.4, 0.2]])

response = requests.post(

"http://localhost:3000/predict",

headers={"content-type": "application/json"},

json=data.tolist()

)

print(f"Prediction: {response.json()}")

Test dengan JSON

response = requests.post(

"http://localhost:3000/classify",

json={"features": [5.1, 3.5, 1.4, 0.2]}

)

print(f"Classification: {response.json()}")

Building Bentos

1. Buat bentofile.yaml

# bentofile.yaml

service: "service:svc"

labels:

owner: ml-team

project: iris-classifier

include:

  • ".py"
python:

packages:

  • scikit-learn
  • numpy
docker:

distro: debian

pythonversion: "3.10"

Artikel Terkait

Tutorial KServe: Model Serving Serverless di Kubernetes

Serverless Model Serving di Kubernetes dengan KServe KServe adalah platform native Kubernetes untuk menyajikan model mac...

Tutorial Lengkap Kubeflow: MLOps di Kubernetes

Tutorial Lengkap Kubeflow: MLOps di Kubernetes Kubeflow adalah platform open-source untuk deploy, mengelola, dan scaling...

Tutorial Lengkap Ray Serve: Scalable ML Model Serving

Tutorial Lengkap Ray Serve: Scalable ML Model Serving Ray Serve adalah library model serving yang scalable dibangun di a...

Tutorial Text Generation Inference (TGI): Serving LLM untuk Produksi

Menyajikan LLM di Produksi dengan Text Generation Inference (TGI) Text Generation Inference (TGI) adalah toolkit buatan ...