MLOps End-to-End Project Tutorial: From Data to Production

# Tutorial 20: Proyek MLOps End-to-End ## Daftar Isi 1. [Pendahuluan](#pendahuluan) 2. [Prasyarat](#prasyarat) 3. [Gambaran Proyek](#gambaran-proyek) 4. [Versioning Data dengan DVC](#versioning-data...

By Ruby Abdullah · · tutorial
MLOpsEnd-to-EndCI/CDDVCMLflowProduction

Tutorial 20: MLOps End-to-End Project

Table of Contents

  • Introduction
  • Prerequisites
  • Project Overview
  • Data Versioning with DVC
  • Experiment Tracking with MLflow
  • Building the Training Pipeline
  • Model Registry
  • CI/CD with GitHub Actions
  • Containerization with Docker
  • Deployment
  • Monitoring with Evidently
  • Alerting
  • Best Practices
  • Conclusion
  • Introduction

    MLOps is the discipline of deploying and maintaining machine learning models in production reliably and efficiently. While building an accurate model is important, the real challenge lies in everything around it: versioning data and code together, tracking experiments reproducibly, automating training and deployment pipelines, monitoring model performance in production, and responding to data drift.

    This tutorial walks through a complete MLOps project from raw data to production monitoring. We will build a customer churn prediction system using industry-standard tools: DVC for data versioning, MLflow for experiment tracking and model registry, GitHub Actions for CI/CD, Docker for containerization, and Evidently for production monitoring.

    Prerequisites

    • Python 3.9+
    • Git and GitHub account
    • Docker and Docker Compose
    • AWS CLI or equivalent cloud CLI (for deployment)
    • Basic understanding of ML model training

    # Install all required packages
    

    pip install dvc[s3] mlflow scikit-learn pandas evidently

    pip install fastapi uvicorn docker boto3

    import os

    print("MLOps E2E Tutorial - Environment Setup")

    Project Overview

    Our project structure follows MLOps best practices with clear separation of concerns.

    churn-prediction/
    

    ├── .github/

    │ └── workflows/

    │ ├── train.yml

    │ ├── test.yml

    │ └── deploy.yml

    ├── data/

    │ ├── raw/

    │ │ └── customers.csv.dvc

    │ └── processed/

    │ └── features.csv.dvc

    ├── src/

    │ ├── data/

    │ │ ├── init.py

    │ │ ├── prepare.py

    │ │ └── validate.py

    │ ├── features/

    │ │ ├── init.py

    │ │ └── buildfeatures.py

    │ ├── models/

    │ │ ├── init.py

    │ │ ├── train.py

    │ │ └── predict.py

    │ └── monitoring/

    │ ├── init.py

    │ └── driftdetection.py

    ├── serving/

    │ ├── app.py

    │ ├── Dockerfile

    │ └── requirements.txt

    ├── tests/

    │ ├── testdata.py

    │ ├── testmodel.py

    │ └── testapi.py

    ├── configs/

    │ └── config.yaml

    ├── dvc.yaml

    ├── dvc.lock

    ├── params.yaml

    ├── docker-compose.yml

    └── requirements.txt

    Data Versioning with DVC

    DVC (Data Version Control) tracks large datasets and model files alongside your Git repository without storing them in Git itself.

    Setting Up DVC

    # Initialize DVC in your Git repository
    

    cd churn-prediction

    dvc init

    Configure remote storage (S3 in this example)

    dvc remote add -d myremote s3://my-ml-bucket/dvc-store

    dvc remote modify myremote region us-east-1

    Track data files

    dvc add data/raw/customers.csv

    git add data/raw/customers.csv.dvc data/raw/.gitignore

    git commit -m "Track raw customer data with DVC"

    Push data to remote storage

    dvc push

    DVC Pipeline Definition

    # dvc.yaml - Defines the reproducible ML pipeline
    

    stages:

    prepare:

    cmd: python src/data/prepare.py

    deps:

    • src/data/prepare.py
    • data/raw/customers.csv
    params:

    • prepare.testsize
    • prepare.randomseed
    outs:

    • data/processed/train.csv
    • data/processed/test.csv

    featurize:

    cmd: python src/features/buildfeatures.py

    deps:

    Related Articles

    Azure DevOps for MLOps Tutorial: CI/CD for Machine Learning

    Tutorial Lengkap Azure DevOps untuk MLOps: CI/CD untuk Machine Learning Azure DevOps menyediakan kemampuan CI/CD kompreh...

    Azure MLflow Integration Tutorial: Experiment Tracking on Azure

    Tutorial Lengkap Azure MLflow Integration: Experiment Tracking dan Model Management Azure Machine Learning menyediakan i...

    Azure ML Managed Endpoints Tutorial: Production Model Deployment

    Tutorial Lengkap Azure ML Managed Endpoints: Deployment Model Production Azure ML Managed Endpoints menyediakan solusi f...

    Azure ML Pipelines Tutorial: ML Pipeline Automation

    Tutorial Lengkap Azure ML Pipelines: CI/CD untuk Machine Learning Azure ML Pipelines memungkinkan Anda membangun workflo...