Complete Vertex AI Pipelines Tutorial: Orchestrating ML Workflows

Vertex AI Pipelines enables you to orchestrate ML workflows as directed acyclic graphs (DAGs). Built on Kubeflow Pipelines, it provides serverless execution with Google Cloud integration.

Why Vertex AI Pipelines?

Key Benefits:

Serverless: No infrastructure to manage
Reproducible: Version-controlled workflows
Scalable: Handles large-scale ML jobs
Integration: Native Google Cloud services
Reusable: Modular pipeline components

Use Cases:

Automated ML training
Data preprocessing workflows
Model deployment pipelines
Feature engineering
MLOps automation

Prerequisites

pip install google-cloud-aiplatform kfp Authenticate gcloud auth login gcloud config set project your-project-id

Quick Start

1. Simple Pipeline

from kfp import dsl
from kfp.dsl import component
from google.cloud import aiplatform

Define components
@component
def preprocessdata(inputpath: str, outputpath: str):

    import pandas as pd
    df = pd.readcsv(inputpath)

    df = df.dropna()
    df.tocsv(outputpath, index=False)

    return outputpath

@component
def trainmodel(datapath: str, modelpath: str) -> float:

    import pandas as pd
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.modelselection import traintestsplit
    import joblib

    df = pd.readcsv(datapath)
    X = df.drop("target", axis=1)
    y = df["target"]

    Xtrain, Xtest, ytrain, ytest = traintestsplit(X, y, testsize=0.2)


    model = RandomForestClassifier(nestimators=100)
    model.fit(Xtrain, ytrain)

    accuracy = model.score(Xtest, ytest)
    joblib.dump(model, modelpath)


    return accuracy

Define pipeline
@dsl.pipeline(
    name="simple-ml-pipeline",
    description="A simple ML training pipeline"
)
def mlpipeline(inputdata: str, modeloutput: str):
    preprocesstask = preprocessdata(
        inputpath=inputdata,
        outputpath="gs://bucket/processed/data.csv"

    )

    traintask = trainmodel(

        datapath=preprocesstask.output,

        modelpath=modeloutput

    )

Compile and run
from kfp import compiler

compiler.Compiler().compile(
    pipelinefunc=mlpipeline,

    packagepath="pipeline.json"
)

Submit pipeline
aiplatform.init(project="your-project", location="us-central1")

job = aiplatform.PipelineJob(
    displayname="ml-pipeline-run",

    templatepath="pipeline.json",
    parametervalues={

        "inputdata": "gs://bucket/raw/data.csv",
        "modeloutput": "gs://bucket/models/model.joblib"

    }
)

job.run()

Pipeline Components

1. Python Function Components

from kfp.dsl import component, Input, Output, Dataset, Model, Metrics

@component(
    baseimage="python:3.9",
    packagestoinstall=["pandas", "scikit-learn"]
)
def trainsklearnmodel(
    trainingdata: Input[Dataset],

    model: Output[Model],
    metrics: Output[Metrics],
    nestimators: int = 100,
    maxdepth: int = 10

):
    import pandas as pd
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.modelselection import traintestsplit
    from sklearn.metrics import accuracyscore, f1score
    import joblib

    # Load data
    df = pd.readcsv(trainingdata.path)
    X = df.drop("target", axis=1)
    y = df["target"]

    Xtrain, Xtest, ytrain, ytest = traintestsplit(X, y, testsize=0.2)


    # Train
    clf = RandomForestClassifier(nestimators=nestimators, maxdepth=maxdepth)

    clf.fit(Xtrain, ytrain)


    # Evaluate
    predictions = clf.predict(Xtest)

Vertex AI Pipelines Tutorial: ML Pipeline Orchestration

Complete Vertex AI Pipelines Tutorial: Orchestrating ML Workflows

Why Vertex AI Pipelines?

Prerequisites

Authenticate

Quick Start

1. Simple Pipeline

Define components

Define pipeline

Compile and run

Submit pipeline

Pipeline Components

1. Python Function Components

Related Articles

Vertex AI Model Monitoring Tutorial: Production Model Observability

Vertex AI Feature Store Tutorial: Centralized Feature Management

Complete Vertex AI Tutorial: Google Cloud Unified ML Platform

Azure ML Pipelines Tutorial: ML Pipeline Automation

Related Articles

Vertex AI Model Monitoring Tutorial: Production Model Observability

Tutorial Lengkap Vertex AI Model Monitoring: Monitoring ML Berkelanjutan Vertex AI Model Monitoring secara otomatis mend...

Vertex AI Feature Store Tutorial: Centralized Feature Management

Tutorial Lengkap Vertex AI Feature Store: Manajemen Fitur Terpusat Vertex AI Feature Store adalah repositori terpusat un...

Complete Vertex AI Tutorial: Google Cloud Unified ML Platform

Tutorial Lengkap Vertex AI: Platform ML Terpadu di Google Cloud Vertex AI adalah platform machine learning terpadu Googl...

Azure ML Pipelines Tutorial: ML Pipeline Automation

Tutorial Lengkap Azure ML Pipelines: CI/CD untuk Machine Learning Azure ML Pipelines memungkinkan Anda membangun workflo...