Complete MLflow Tutorial: From Setup to Production

## Pendahuluan MLflow adalah platform open-source untuk mengelola end-to-end machine learning lifecycle. Dikembangkan oleh Databricks, MLflow membantu data scientist dan ML engineer untuk tracking ex...

By Ruby Abdullah · · tutorial
MLflowMLOpsMachine LearningModel VersioningPython

Introduction

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. Developed by Databricks, MLflow helps data scientists and ML engineers track experiments, package code, manage models, and deploy to production.

Why MLflow?
  • Reproducibility: Track all experiments in detail
  • Collaboration: Share results with team
  • Model versioning: Manage different model versions
  • Deployment ready: Easily deploy models to various platforms
  • Framework agnostic: Works with TensorFlow, PyTorch, Scikit-learn, etc.

MLflow Main Components

MLflow consists of 4 main components:

  • MLflow Tracking: Record and query experiments
  • MLflow Projects: Package ML code for reproducibility
  • MLflow Models: Deploy models to various platforms
  • MLflow Registry: Centralized model store for versioning
  • Installation and Setup

    Basic Installation

    # Install MLflow
    

    pip install mlflow

    Install with extras for various backends

    pip install mlflow[extras]

    Verify installation

    mlflow --version

    Setup Database Backend (PostgreSQL)

    For production, use a database backend:

    # Install dependencies
    

    pip install psycopg2-binary

    Setup PostgreSQL (example using Docker)

    docker run -d \

    --name mlflow-db \

    -e POSTGRESUSER=mlflow \

    -e POSTGRESPASSWORD=mlflow \

    -e POSTGRESDB=mlflow \

    -p 5432:5432 \

    postgres:13

    Setup Artifact Store (MinIO/S3)

    # Install boto3 for S3 compatibility
    

    pip install boto3

    Setup MinIO (S3-compatible storage)

    docker run -d \

    --name mlflow-minio \

    -p 9000:9000 \

    -p 9001:9001 \

    -e MINIOROOTUSER=minioadmin \

    -e MINIOROOTPASSWORD=minioadmin \

    minio/minio server /data --console-address ":9001"

    Run MLflow Server

    # Development mode (local file store)
    

    mlflow server --host 0.0.0.0 --port 5000

    Production mode (with database and S3)

    mlflow server \

    --backend-store-uri postgresql://mlflow:mlflow@localhost:5432/mlflow \

    --default-artifact-root s3://mlflow-artifacts \

    --host 0.0.0.0 \

    --port 5000

    Setup Environment Variables

    Create .env file:

    # MLflow Tracking
    

    MLFLOWTRACKINGURI=http://localhost:5000

    S3/MinIO Configuration

    AWSACCESSKEYID=minioadmin

    AWSSECRETACCESSKEY=minioadmin

    MLFLOWS3ENDPOINTURL=http://localhost:9000

    MLflow Tracking: Experiment Tracking

    Basic Tracking

    import mlflow
    

    import mlflow.sklearn

    from sklearn.ensemble import RandomForestClassifier

    from sklearn.datasets import loadiris

    from sklearn.modelselection import traintestsplit

    from sklearn.metrics import accuracyscore, f1score

    Set tracking URI

    mlflow.settrackinguri("http://localhost:5000")

    Set experiment

    mlflow.setexperiment("iris-classification")

    Load data

    iris = loadiris()

    Xtrain, Xtest, ytrain, ytest = traintestsplit(

    iris.data, iris.target, testsize=0.2, randomstate=42

    )

    Start MLflow run

    with mlflow.startrun(runname="random-forest-v1") as run:

    # Log parameters

    params = {

    "nestimators": 100,

    "maxdepth": 5,

    "randomstate": 42

    }

    mlflow.logparams(params)

    # Train model

    model = RandomForestClassifier(params)

    model.fit(Xtrain, ytrain)

    # Make predictions

    ypred = model.predict(Xtest)

    # Log metrics

    metrics = {

    "accuracy": accuracyscore(ytest, ypred),

    "f1score": f1score(ytest, ypred, average="weighted")

    }

    mlflow.logmetrics(metrics)

    # Log model

    mlflow.sklearn.logmodel(

    model,

    "model",

    registeredmodelname="iris-classifier"

    )

    Related Articles