Tutorial Lengkap Kubeflow: MLOps di Kubernetes

Kubeflow adalah platform open-source untuk deploy, mengelola, dan scaling workflow machine learning di Kubernetes. Platform ini menyediakan solusi MLOps lengkap dengan pipelines, model serving, notebooks, dan experiment tracking.

Mengapa Kubeflow?

Keunggulan Kubeflow:

Kubernetes native: Manfaatkan skalabilitas dan reliabilitas K8s
End-to-end MLOps: Dari eksperimen hingga production
Portable: Jalankan di cluster Kubernetes manapun
Composable: Gunakan hanya komponen yang diperlukan
Open source: Komunitas aktif dan ekosistem luas

Use Cases:

Orkestrasi ML pipeline
Distributed training
Model serving skala besar
Experiment tracking
Feature engineering

Instalasi

1. Prerequisites

# Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl && sudo mv kubectl /usr/local/bin/

Install kustomize
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/installkustomize.sh" | bash

sudo mv kustomize /usr/local/bin/

2. Install Kubeflow

# Clone manifests git clone https://github.com/kubeflow/manifests.git cd manifests Install dengan kustomize while ! kustomize build example | kubectl apply -f -; do echo "Mencoba ulang..." sleep 10 done Cek instalasi kubectl get pods -n kubeflow

3. Akses Dashboard

# Port forward kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80 Akses di http://localhost:8080 Kredensial default: user@example.com / 12341234

Kubeflow Pipelines

1. Basic Pipeline

from kfp import dsl
from kfp import compiler

@dsl.component
def preprocessdata(datapath: str) -> str:

    import pandas as pd
    df = pd.readcsv(datapath)

    df = df.dropna()
    outputpath = "/tmp/preprocessed.csv"
    df.tocsv(outputpath, index=False)
    return outputpath


@dsl.component
def trainmodel(datapath: str, epochs: int) -> str:

    import pickle
    from sklearn.ensemble import RandomForestClassifier
    import pandas as pd

    df = pd.readcsv(datapath)

    X = df.drop("target", axis=1)
    y = df["target"]

    model = RandomForestClassifier(nestimators=100)
    model.fit(X, y)

    modelpath = "/tmp/model.pkl"

    with open(modelpath, "wb") as f:
        pickle.dump(model, f)

    return modelpath


@dsl.component
def evaluatemodel(modelpath: str, testdata: str) -> float:
    import pickle
    import pandas as pd
    from sklearn.metrics import accuracyscore


    with open(modelpath, "rb") as f:
        model = pickle.load(f)

    df = pd.readcsv(testdata)
    X = df.drop("target", axis=1)
    y = df["target"]

    predictions = model.predict(X)
    accuracy = accuracyscore(y, predictions)


    return accuracy

@dsl.pipeline(name="ML Training Pipeline")
def mlpipeline(datapath: str, epochs: int = 10):

    preprocesstask = preprocessdata(datapath=datapath)

    traintask = trainmodel(

        datapath=preprocesstask.output,

        epochs=epochs
    )
    evaluatetask = evaluatemodel(

        modelpath=traintask.output,

        testdata=preprocesstask.output

    )

Compile pipeline
compiler.Compiler().compile(mlpipeline, "pipeline.yaml")

2. Jalankan Pipeline

from kfp.client import Client

Koneksi ke Kubeflow
client = Client(host="http://localhost:8080/pipeline")

Buat experiment
experiment = client.createexperiment("my-experiment")


Jalankan pipeline
run = client.runpipeline(
    experimentid=experiment.id,

    jobname="training-run-1",
    pipelinepackagepath="pipeline.yaml",
    params={"datapath": "gs://bucket/data.csv", "epochs": 20}

)

Tutorial Lengkap Kubeflow: MLOps di Kubernetes

Tutorial Lengkap Kubeflow: MLOps di Kubernetes

Mengapa Kubeflow?

Instalasi

1. Prerequisites

Install kustomize

2. Install Kubeflow

Install dengan kustomize

Cek instalasi

3. Akses Dashboard

Akses di http://localhost:8080

Kredensial default: user@example.com / 12341234

Kubeflow Pipelines

1. Basic Pipeline

Compile pipeline

2. Jalankan Pipeline

Koneksi ke Kubeflow

Buat experiment

Jalankan pipeline

Artikel Terkait

Tutorial KServe: Model Serving Serverless di Kubernetes

Tutorial Lengkap BentoML: Packaging dan Serving ML Models ke Production

Tutorial Ray Train & Ray Tune: Training Terdistribusi dan Tuning Hyperparameter

Tutorial Triton Inference Server: High-Performance Model Serving