Tutorial Lengkap Ray Serve: Scalable ML Model Serving

Ray Serve adalah library model serving yang scalable dibangun di atas Ray. Library ini memungkinkan Anda menyajikan model ML dengan scaling otomatis, batching, dan komposisi multi-model, menjadikannya ideal untuk deployment ML production.

Mengapa Ray Serve?

Keunggulan Ray Serve:

Framework agnostic: Bekerja dengan framework ML apapun
Scalable: Scaling otomatis berdasarkan load
Composable: Kombinasikan multiple models dengan mudah
Batching: Request batching otomatis
Native Python: API Python-first yang simple

Use Cases:

Model serving skala besar
Multi-model pipelines
A/B testing
Real-time inference
Batch inference

Instalasi

pip install "ray[serve]"

Verify instalasi
python -c "import ray; from ray import serve; print(ray.version)"

Quick Start

1. Basic Deployment

from ray import serve
import ray

ray.init()
serve.start()

@serve.deployment
class ModelDeployment:
    def init(self):
        self.model = "simplemodel"


    def call(self, request):
        return {"message": f"Diproses oleh {self.model}"}

Deploy
ModelDeployment.deploy()

Test
import requests
response = requests.get("http://localhost:8000/ModelDeployment")
print(response.json())

2. Dengan FastAPI

from ray import serve
from fastapi import FastAPI
import ray

app = FastAPI()

@serve.deployment
@serve.ingress(app)
class MLService:
    def init(self):
        self.model = self.loadmodel()


    def loadmodel(self):

        return "mymodel"

    @app.get("/predict")
    def predict(self, text: str):
        return {"prediction": f"Hasil untuk: {text}"}

    @app.get("/health")
    def health(self):
        return {"status": "healthy"}

ray.init()
serve.run(MLService.bind())

3. Serve Model ML

from ray import serve
import ray
import pickle
import numpy as np

@serve.deployment
class SklearnModel:
    def init(self, modelpath: str):

        with open(modelpath, "rb") as f:
            self.model = pickle.load(f)

    async def call(self, request):
        data = await request.json()
        features = np.array(data["features"]).reshape(1, -1)
        prediction = self.model.predict(features)
        return {"prediction": prediction.tolist()}

ray.init()
serve.run(SklearnModel.bind(modelpath="model.pkl"))

Konfigurasi Deployment

1. Alokasi Resource

from ray import serve

@serve.deployment(
    numreplicas=3,
    rayactoroptions={
        "numcpus": 2,

        "numgpus": 1,
        "memory": 4  1024  1024  1024  # 4GB

    }
)
class GPUModel:
    def init(self):
        import torch
        self.device = torch.device("cuda")
        self.model = self.loadmodel()

    def loadmodel(self):
        import torch
        model = torch.nn.Linear(10, 2)
        return model.to(self.device)

    async def call(self, request):
        import torch
        data = await request.json()
        tensor = torch.tensor(data["input"]).to(self.device)
        output = self.model(tensor)
        return {"output": output.cpu().tolist()}

2. Autoscaling

from ray import serve from ray.serve.config import AutoscalingConfig @serve.deployment( autoscalingconfig=AutoscalingConfig( minreplicas=1, maxreplicas=10, targetnumongoingrequestsperreplica=5, upscaledelays=10, downscaledelays=30 ) ) class AutoscaledModel: def init(self): self.model = "autoscaledmodel"

Tutorial Lengkap Ray Serve: Scalable ML Model Serving

Tutorial Lengkap Ray Serve: Scalable ML Model Serving

Mengapa Ray Serve?

Instalasi

Verify instalasi

Quick Start

1. Basic Deployment

Deploy

Test

2. Dengan FastAPI

3. Serve Model ML

Konfigurasi Deployment

1. Alokasi Resource

2. Autoscaling

Artikel Terkait

Tutorial Lengkap Vertex AI: Platform ML Terpadu Google Cloud

Tutorial Lengkap Azure Machine Learning: End-to-End ML Platform

Tutorial Lengkap AWS SageMaker: Machine Learning di Cloud

Tutorial Lengkap Weights & Biases: Experiment Tracking untuk Machine Learning