Tutorial 20: MLOps End-to-End Project

Introduction

Prerequisites

Project Overview

Data Versioning with DVC

Experiment Tracking with MLflow

Building the Training Pipeline

Model Registry

CI/CD with GitHub Actions

Containerization with Docker

Deployment

Monitoring with Evidently

Alerting

Best Practices

Conclusion

Introduction

MLOps is the discipline of deploying and maintaining machine learning models in production reliably and efficiently. While building an accurate model is important, the real challenge lies in everything around it: versioning data and code together, tracking experiments reproducibly, automating training and deployment pipelines, monitoring model performance in production, and responding to data drift.

This tutorial walks through a complete MLOps project from raw data to production monitoring. We will build a customer churn prediction system using industry-standard tools: DVC for data versioning, MLflow for experiment tracking and model registry, GitHub Actions for CI/CD, Docker for containerization, and Evidently for production monitoring.

Prerequisites

Python 3.9+
Git and GitHub account
Docker and Docker Compose
AWS CLI or equivalent cloud CLI (for deployment)
Basic understanding of ML model training

# Install all required packages
pip install dvc[s3] mlflow scikit-learn pandas evidently
pip install fastapi uvicorn docker boto3

import os
print("MLOps E2E Tutorial - Environment Setup")

Project Overview

Our project structure follows MLOps best practices with clear separation of concerns.

churn-prediction/
├── .github/
│   └── workflows/
│       ├── train.yml
│       ├── test.yml
│       └── deploy.yml
├── data/
│   ├── raw/
│   │   └── customers.csv.dvc
│   └── processed/
│       └── features.csv.dvc
├── src/
│   ├── data/
│   │   ├── init.py
│   │   ├── prepare.py
│   │   └── validate.py
│   ├── features/
│   │   ├── init.py
│   │   └── buildfeatures.py

│   ├── models/
│   │   ├── init.py
│   │   ├── train.py
│   │   └── predict.py
│   └── monitoring/
│       ├── init.py
│       └── driftdetection.py
├── serving/
│   ├── app.py
│   ├── Dockerfile
│   └── requirements.txt
├── tests/
│   ├── testdata.py

│   ├── testmodel.py
│   └── testapi.py

├── configs/
│   └── config.yaml
├── dvc.yaml
├── dvc.lock
├── params.yaml
├── docker-compose.yml
└── requirements.txt

Data Versioning with DVC

DVC (Data Version Control) tracks large datasets and model files alongside your Git repository without storing them in Git itself.

Setting Up DVC

# Initialize DVC in your Git repository cd churn-prediction dvc init Configure remote storage (S3 in this example) dvc remote add -d myremote s3://my-ml-bucket/dvc-store dvc remote modify myremote region us-east-1 Track data files dvc add data/raw/customers.csv git add data/raw/customers.csv.dvc data/raw/.gitignore git commit -m "Track raw customer data with DVC" Push data to remote storage dvc push

DVC Pipeline Definition

# dvc.yaml - Defines the reproducible ML pipeline
stages:
  prepare:
    cmd: python src/data/prepare.py
    deps:
src/data/prepare.py
data/raw/customers.csv
    params:
prepare.testsize
prepare.randomseed

    outs:
data/processed/train.csv
data/processed/test.csv

  featurize:
    cmd: python src/features/buildfeatures.py
    deps:

MLOps End-to-End Project Tutorial: From Data to Production

Tutorial 20: MLOps End-to-End Project

Table of Contents

Introduction

Prerequisites

pip install dvc[s3] mlflow scikit-learn pandas evidently

pip install fastapi uvicorn docker boto3

Project Overview

Data Versioning with DVC

Setting Up DVC

Configure remote storage (S3 in this example)

Track data files

Push data to remote storage

DVC Pipeline Definition

Related Articles

Azure DevOps for MLOps Tutorial: CI/CD for Machine Learning

Azure MLflow Integration Tutorial: Experiment Tracking on Azure

Azure ML Managed Endpoints Tutorial: Production Model Deployment

Azure ML Pipelines Tutorial: ML Pipeline Automation

Related Articles

Azure DevOps for MLOps Tutorial: CI/CD for Machine Learning

Tutorial Lengkap Azure DevOps untuk MLOps: CI/CD untuk Machine Learning Azure DevOps menyediakan kemampuan CI/CD kompreh...

Azure MLflow Integration Tutorial: Experiment Tracking on Azure

Tutorial Lengkap Azure MLflow Integration: Experiment Tracking dan Model Management Azure Machine Learning menyediakan i...

Azure ML Managed Endpoints Tutorial: Production Model Deployment

Tutorial Lengkap Azure ML Managed Endpoints: Deployment Model Production Azure ML Managed Endpoints menyediakan solusi f...

Azure ML Pipelines Tutorial: ML Pipeline Automation

Tutorial Lengkap Azure ML Pipelines: CI/CD untuk Machine Learning Azure ML Pipelines memungkinkan Anda membangun workflo...