Complete Kubeflow Tutorial: MLOps on Kubernetes

# Tutorial Lengkap Kubeflow: MLOps di Kubernetes Kubeflow adalah platform open-source untuk deploy, mengelola, dan scaling workflow machine learning di Kubernetes. Platform ini menyediakan solusi MLO...

By Ruby Abdullah · · tutorial
KubeflowKubernetesMLOpsML PipelineDistributed TrainingModel Serving

Complete Kubeflow Tutorial: MLOps on Kubernetes

Kubeflow is an open-source platform for deploying, managing, and scaling machine learning workflows on Kubernetes. It provides a complete MLOps solution with pipelines, model serving, notebooks, and experiment tracking.

Why Kubeflow?

Kubeflow Advantages:
  • Kubernetes native: Leverage K8s scalability and reliability
  • End-to-end MLOps: From experimentation to production
  • Portable: Run on any Kubernetes cluster
  • Composable: Use only the components you need
  • Open source: Active community and ecosystem

Use Cases:
  • ML pipeline orchestration
  • Distributed training
  • Model serving at scale
  • Experiment tracking
  • Feature engineering

Installation

1. Prerequisites

# Install kubectl

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"

chmod +x kubectl && sudo mv kubectl /usr/local/bin/

Install kustomize

curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/installkustomize.sh" | bash

sudo mv kustomize /usr/local/bin/

2. Install Kubeflow

# Clone manifests

git clone https://github.com/kubeflow/manifests.git

cd manifests

Install with kustomize

while ! kustomize build example | kubectl apply -f -; do

echo "Retrying..."

sleep 10

done

Check installation

kubectl get pods -n kubeflow

3. Access Dashboard

# Port forward

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

Access at http://localhost:8080

Default credentials: user@example.com / 12341234

Kubeflow Pipelines

1. Basic Pipeline

from kfp import dsl

from kfp import compiler

@dsl.component

def preprocessdata(datapath: str) -> str:

import pandas as pd

df = pd.readcsv(datapath)

df = df.dropna()

outputpath = "/tmp/preprocessed.csv"

df.tocsv(outputpath, index=False)

return outputpath

@dsl.component

def trainmodel(datapath: str, epochs: int) -> str:

import pickle

from sklearn.ensemble import RandomForestClassifier

import pandas as pd

df = pd.readcsv(datapath)

X = df.drop("target", axis=1)

y = df["target"]

model = RandomForestClassifier(nestimators=100)

model.fit(X, y)

modelpath = "/tmp/model.pkl"

with open(modelpath, "wb") as f:

pickle.dump(model, f)

return modelpath

@dsl.component

def evaluatemodel(modelpath: str, testdata: str) -> float:

import pickle

import pandas as pd

from sklearn.metrics import accuracyscore

with open(modelpath, "rb") as f:

model = pickle.load(f)

df = pd.readcsv(testdata)

X = df.drop("target", axis=1)

y = df["target"]

predictions = model.predict(X)

accuracy = accuracyscore(y, predictions)

return accuracy

@dsl.pipeline(name="ML Training Pipeline")

def mlpipeline(datapath: str, epochs: int = 10):

preprocesstask = preprocessdata(datapath=datapath)

traintask = trainmodel(

datapath=preprocesstask.output,

epochs=epochs

)

evaluatetask = evaluatemodel(

modelpath=traintask.output,

testdata=preprocesstask.output

)

Compile pipeline

compiler.Compiler().compile(mlpipeline, "pipeline.yaml")

2. Run Pipeline

from kfp.client import Client

Connect to Kubeflow

client = Client(host="http://localhost:8080/pipeline")

Create experiment

experiment = client.createexperiment("my-experiment")

Run pipeline

run = client.runpipeline(

experimentid=experiment.id,

jobname="training-run-1",

pipelinepackagepath="pipeline.yaml",

params={"datapath": "gs://bucket/data.csv", "epochs": 20}

)

print(f"Run ID: {run.id}")

Related Articles

KServe Tutorial: Serverless Model Serving on Kubernetes

Serverless Model Serving di Kubernetes dengan KServe KServe adalah platform native Kubernetes untuk menyajikan model mac...

Complete BentoML Tutorial: Packaging and Serving ML Models to Production

Tutorial Lengkap BentoML: Packaging dan Serving ML Models ke Production BentoML adalah framework open-source untuk build...

Ray Train & Ray Tune Tutorial: Distributed Training and Hyperparameter Tuning

Ray Train & Ray Tune: Pelatihan Terdistribusi dan Penyetelan Hiperparameter Sebagian besar proyek machine learning dimul...

Triton Inference Server Tutorial: High-Performance Model Serving

Tutorial 19: Triton Inference Server - Penyajian Model Berperforma Tinggi Daftar Isi Pendahuluan Prasyarat Menyiapkan Tr...