Complete Kubeflow Tutorial: MLOps on Kubernetes

Kubeflow is an open-source platform for deploying, managing, and scaling machine learning workflows on Kubernetes. It provides a complete MLOps solution with pipelines, model serving, notebooks, and experiment tracking.

Why Kubeflow?

Kubeflow Advantages:

Kubernetes native: Leverage K8s scalability and reliability
End-to-end MLOps: From experimentation to production
Portable: Run on any Kubernetes cluster
Composable: Use only the components you need
Open source: Active community and ecosystem

Use Cases:

ML pipeline orchestration
Distributed training
Model serving at scale
Experiment tracking
Feature engineering

Installation

1. Prerequisites

# Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl && sudo mv kubectl /usr/local/bin/

Install kustomize
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/installkustomize.sh" | bash

sudo mv kustomize /usr/local/bin/

2. Install Kubeflow

# Clone manifests git clone https://github.com/kubeflow/manifests.git cd manifests Install with kustomize while ! kustomize build example | kubectl apply -f -; do echo "Retrying..." sleep 10 done Check installation kubectl get pods -n kubeflow

3. Access Dashboard

# Port forward kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80 Access at http://localhost:8080 Default credentials: user@example.com / 12341234

Kubeflow Pipelines

1. Basic Pipeline

from kfp import dsl
from kfp import compiler

@dsl.component
def preprocessdata(datapath: str) -> str:

    import pandas as pd
    df = pd.readcsv(datapath)

    df = df.dropna()
    outputpath = "/tmp/preprocessed.csv"
    df.tocsv(outputpath, index=False)
    return outputpath


@dsl.component
def trainmodel(datapath: str, epochs: int) -> str:

    import pickle
    from sklearn.ensemble import RandomForestClassifier
    import pandas as pd

    df = pd.readcsv(datapath)

    X = df.drop("target", axis=1)
    y = df["target"]

    model = RandomForestClassifier(nestimators=100)
    model.fit(X, y)

    modelpath = "/tmp/model.pkl"

    with open(modelpath, "wb") as f:
        pickle.dump(model, f)

    return modelpath


@dsl.component
def evaluatemodel(modelpath: str, testdata: str) -> float:
    import pickle
    import pandas as pd
    from sklearn.metrics import accuracyscore


    with open(modelpath, "rb") as f:
        model = pickle.load(f)

    df = pd.readcsv(testdata)
    X = df.drop("target", axis=1)
    y = df["target"]

    predictions = model.predict(X)
    accuracy = accuracyscore(y, predictions)


    return accuracy

@dsl.pipeline(name="ML Training Pipeline")
def mlpipeline(datapath: str, epochs: int = 10):

    preprocesstask = preprocessdata(datapath=datapath)

    traintask = trainmodel(

        datapath=preprocesstask.output,

        epochs=epochs
    )
    evaluatetask = evaluatemodel(

        modelpath=traintask.output,

        testdata=preprocesstask.output

    )

Compile pipeline
compiler.Compiler().compile(mlpipeline, "pipeline.yaml")

2. Run Pipeline

from kfp.client import Client

Connect to Kubeflow
client = Client(host="http://localhost:8080/pipeline")

Create experiment
experiment = client.createexperiment("my-experiment")


Run pipeline
run = client.runpipeline(
    experimentid=experiment.id,

    jobname="training-run-1",
    pipelinepackagepath="pipeline.yaml",
    params={"datapath": "gs://bucket/data.csv", "epochs": 20}

)

print(f"Run ID: {run.id}")

Complete Kubeflow Tutorial: MLOps on Kubernetes

Complete Kubeflow Tutorial: MLOps on Kubernetes

Why Kubeflow?

Installation

1. Prerequisites

Install kustomize

2. Install Kubeflow

Install with kustomize

Check installation

3. Access Dashboard

Access at http://localhost:8080

Default credentials: user@example.com / 12341234

Kubeflow Pipelines

1. Basic Pipeline

Compile pipeline

2. Run Pipeline

Connect to Kubeflow

Create experiment

Run pipeline

Related Articles

KServe Tutorial: Serverless Model Serving on Kubernetes

Complete BentoML Tutorial: Packaging and Serving ML Models to Production

Ray Train & Ray Tune Tutorial: Distributed Training and Hyperparameter Tuning

Triton Inference Server Tutorial: High-Performance Model Serving

Related Articles

KServe Tutorial: Serverless Model Serving on Kubernetes

Serverless Model Serving di Kubernetes dengan KServe KServe adalah platform native Kubernetes untuk menyajikan model mac...

Complete BentoML Tutorial: Packaging and Serving ML Models to Production

Tutorial Lengkap BentoML: Packaging dan Serving ML Models ke Production BentoML adalah framework open-source untuk build...

Ray Train & Ray Tune Tutorial: Distributed Training and Hyperparameter Tuning

Ray Train & Ray Tune: Pelatihan Terdistribusi dan Penyetelan Hiperparameter Sebagian besar proyek machine learning dimul...

Triton Inference Server Tutorial: High-Performance Model Serving

Tutorial 19: Triton Inference Server - Penyajian Model Berperforma Tinggi Daftar Isi Pendahuluan Prasyarat Menyiapkan Tr...