Tutorial Lengkap Ray Serve: Scalable ML Model Serving

# Tutorial Lengkap Ray Serve: Scalable ML Model Serving Ray Serve adalah library model serving yang scalable dibangun di atas Ray. Library ini memungkinkan Anda menyajikan model ML dengan scaling oto...

By Ruby Abdullah · · tutorial
Ray ServeModel ServingMLOpsDistributed ComputingPythonMachine Learning

Tutorial Lengkap Ray Serve: Scalable ML Model Serving

Ray Serve adalah library model serving yang scalable dibangun di atas Ray. Library ini memungkinkan Anda menyajikan model ML dengan scaling otomatis, batching, dan komposisi multi-model, menjadikannya ideal untuk deployment ML production.

Mengapa Ray Serve?

Keunggulan Ray Serve:
  • Framework agnostic: Bekerja dengan framework ML apapun
  • Scalable: Scaling otomatis berdasarkan load
  • Composable: Kombinasikan multiple models dengan mudah
  • Batching: Request batching otomatis
  • Native Python: API Python-first yang simple

Use Cases:
  • Model serving skala besar
  • Multi-model pipelines
  • A/B testing
  • Real-time inference
  • Batch inference

Instalasi

pip install "ray[serve]"

Verify instalasi

python -c "import ray; from ray import serve; print(ray.version)"

Quick Start

1. Basic Deployment

from ray import serve

import ray

ray.init()

serve.start()

@serve.deployment

class ModelDeployment:

def init(self):

self.model = "simplemodel"

def call(self, request):

return {"message": f"Diproses oleh {self.model}"}

Deploy

ModelDeployment.deploy()

Test

import requests

response = requests.get("http://localhost:8000/ModelDeployment")

print(response.json())

2. Dengan FastAPI

from ray import serve

from fastapi import FastAPI

import ray

app = FastAPI()

@serve.deployment

@serve.ingress(app)

class MLService:

def init(self):

self.model = self.loadmodel()

def loadmodel(self):

return "mymodel"

@app.get("/predict")

def predict(self, text: str):

return {"prediction": f"Hasil untuk: {text}"}

@app.get("/health")

def health(self):

return {"status": "healthy"}

ray.init()

serve.run(MLService.bind())

3. Serve Model ML

from ray import serve

import ray

import pickle

import numpy as np

@serve.deployment

class SklearnModel:

def init(self, modelpath: str):

with open(modelpath, "rb") as f:

self.model = pickle.load(f)

async def call(self, request):

data = await request.json()

features = np.array(data["features"]).reshape(1, -1)

prediction = self.model.predict(features)

return {"prediction": prediction.tolist()}

ray.init()

serve.run(SklearnModel.bind(modelpath="model.pkl"))

Konfigurasi Deployment

1. Alokasi Resource

from ray import serve

@serve.deployment(

numreplicas=3,

rayactoroptions={

"numcpus": 2,

"numgpus": 1,

"memory": 4 1024 1024 1024 # 4GB

}

)

class GPUModel:

def init(self):

import torch

self.device = torch.device("cuda")

self.model = self.loadmodel()

def loadmodel(self):

import torch

model = torch.nn.Linear(10, 2)

return model.to(self.device)

async def call(self, request):

import torch

data = await request.json()

tensor = torch.tensor(data["input"]).to(self.device)

output = self.model(tensor)

return {"output": output.cpu().tolist()}

2. Autoscaling

from ray import serve

from ray.serve.config import AutoscalingConfig

@serve.deployment(

autoscalingconfig=AutoscalingConfig(

minreplicas=1,

maxreplicas=10,

targetnumongoingrequestsperreplica=5,

upscaledelays=10,

downscaledelays=30

)

)

class AutoscaledModel:

def init(self):

self.model = "autoscaledmodel"

Artikel Terkait