Complete Google Cloud Run for ML Tutorial: Serverless ML Deployment

Google Cloud Run provides a serverless platform for deploying containerized ML models. It offers auto-scaling, pay-per-use pricing, and seamless integration with Google Cloud services.

Why Cloud Run for ML?

Key Benefits:

Serverless: No infrastructure management
Auto-scaling: Scale to zero and up automatically
Cost-effective: Pay only for actual usage
Container-based: Deploy any framework
Fast deployment: Deploy in seconds

Prerequisites

pip install google-cloud-run flask gunicorn gcloud auth login gcloud config set project your-project-id

Quick Start

1. Create ML Service

# app.py
from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(name)

Load model on startup
model = joblib.load("model.joblib")

@app.route("/predict", methods=["POST"])
def predict():
    data = request.getjson()

    features = np.array(data["features"]).reshape(1, -1)
    prediction = model.predict(features)
    probability = model.predictproba(features)

    return jsonify({
        "prediction": int(prediction[0]),
        "probability": probability[0].tolist()
    })

@app.route("/health", methods=["GET"])
def health():
    return jsonify({"status": "healthy"})

if name == "main":
    app.run(host="0.0.0.0", port=8080)

2. Create Dockerfile

FROM python:3.9-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY model.joblib . COPY app.py . ENV PORT=8080 EXPOSE 8080 CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 app:app

3. requirements.txt

flask==2.3.0
gunicorn==21.2.0
joblib==1.3.0
scikit-learn==1.3.0
numpy==1.24.0

4. Deploy to Cloud Run

# Build container gcloud builds submit --tag gcr.io/your-project/ml-service Deploy gcloud run deploy ml-service \ --image gcr.io/your-project/ml-service \ --platform managed \ --region us-central1 \ --memory 2Gi \ --cpu 2 \ --min-instances 0 \ --max-instances 10 \ --allow-unauthenticated

FastAPI Service

1. FastAPI Application

# main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI(title="ML Prediction API")

Load model
model = joblib.load("model.joblib")

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: int
    probability: list[float]

@app.post("/predict", responsemodel=PredictionResponse)

async def predict(request: PredictionRequest):
    try:
        features = np.array(request.features).reshape(1, -1)
        prediction = model.predict(features)
        probability = model.predictproba(features)

        return PredictionResponse(
            prediction=int(prediction[0]),
            probability=probability[0].tolist()
        )
    except Exception as e:
        raise HTTPException(statuscode=400, detail=str(e))


@app.get("/health")
async def health():
    return {"status": "healthy"}

2. FastAPI Dockerfile

FROM python:3.9-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY model.joblib . COPY main.py . ENV PORT=8080 EXPOSE 8080 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

PyTorch Service

1. PyTorch Inference

# servepytorch.py
from flask import Flask, request, jsonify
import torch
import torch.nn as nn
import numpy as np

app = Flask(name)

class SimpleNN(nn.Module):

Google Cloud Run for ML Tutorial: Serverless ML Deployment

Complete Google Cloud Run for ML Tutorial: Serverless ML Deployment

Why Cloud Run for ML?

Prerequisites

Quick Start

1. Create ML Service

Load model on startup

2. Create Dockerfile

3. requirements.txt

4. Deploy to Cloud Run

Deploy

FastAPI Service

1. FastAPI Application

Load model

2. FastAPI Dockerfile

PyTorch Service

1. PyTorch Inference

Related Articles

Text Generation Inference (TGI) Tutorial: Production LLM Serving

Modal: Serverless GPU Cloud for ML Model Deployment

MLOps End-to-End Project Tutorial: From Data to Production

Docker for Data Science & ML Tutorial: Model Containerization

Related Articles

Text Generation Inference (TGI) Tutorial: Production LLM Serving

Menyajikan LLM di Produksi dengan Text Generation Inference (TGI) Text Generation Inference (TGI) adalah toolkit buatan ...

Modal: Serverless GPU Cloud for ML Model Deployment

Modal: Serverless GPU Cloud untuk Deploy Model ML Salah satu tantangan terbesar dalam machine learning bukan membuat mod...

MLOps End-to-End Project Tutorial: From Data to Production

Tutorial 20: Proyek MLOps End-to-End Daftar Isi Pendahuluan Prasyarat Gambaran Proyek Versioning Data dengan DVC

Docker for Data Science & ML Tutorial: Model Containerization

Tutorial 15: Docker untuk Data Science dan Machine Learning Daftar Isi Pendahuluan Prasyarat Dasar-Dasar Docker untuk In...