Tutorial Lengkap MLflow: Dari Setup hingga Production

## Pendahuluan MLflow adalah platform open-source untuk mengelola end-to-end machine learning lifecycle. Dikembangkan oleh Databricks, MLflow membantu data scientist dan ML engineer untuk tracking ex...

By Ruby Abdullah · · tutorial
MLflowMLOpsMachine LearningModel VersioningPython

Pendahuluan

MLflow adalah platform open-source untuk mengelola end-to-end machine learning lifecycle. Dikembangkan oleh Databricks, MLflow membantu data scientist dan ML engineer untuk tracking experiments, packaging code, managing models, dan deploying ke production.

Mengapa MLflow?
  • Reproducibility: Track semua experiment dengan detail
  • Collaboration: Share results dengan team
  • Model versioning: Kelola berbagai versi model
  • Deployment ready: Deploy model dengan mudah ke berbagai platform
  • Framework agnostic: Bekerja dengan TensorFlow, PyTorch, Scikit-learn, dll

Komponen Utama MLflow

MLflow terdiri dari 4 komponen utama:

  • MLflow Tracking: Record dan query experiments
  • MLflow Projects: Package ML code untuk reproducibility
  • MLflow Models: Deploy models ke berbagai platform
  • MLflow Registry: Centralized model store untuk versioning
  • Instalasi dan Setup

    Instalasi Dasar

    # Install MLflow
    

    pip install mlflow

    Install dengan extras untuk berbagai backend

    pip install mlflow[extras]

    Verify instalasi

    mlflow --version

    Setup Database Backend (PostgreSQL)

    Untuk production, gunakan database backend:

    # Install dependencies
    

    pip install psycopg2-binary

    Setup PostgreSQL (contoh menggunakan Docker)

    docker run -d \

    --name mlflow-db \

    -e POSTGRESUSER=mlflow \

    -e POSTGRESPASSWORD=mlflow \

    -e POSTGRESDB=mlflow \

    -p 5432:5432 \

    postgres:13

    Setup Artifact Store (MinIO/S3)

    # Install boto3 untuk S3 compatibility
    

    pip install boto3

    Setup MinIO (S3-compatible storage)

    docker run -d \

    --name mlflow-minio \

    -p 9000:9000 \

    -p 9001:9001 \

    -e MINIOROOTUSER=minioadmin \

    -e MINIOROOTPASSWORD=minioadmin \

    minio/minio server /data --console-address ":9001"

    Jalankan MLflow Server

    # Development mode (local file store)
    

    mlflow server --host 0.0.0.0 --port 5000

    Production mode (dengan database dan S3)

    mlflow server \

    --backend-store-uri postgresql://mlflow:mlflow@localhost:5432/mlflow \

    --default-artifact-root s3://mlflow-artifacts \

    --host 0.0.0.0 \

    --port 5000

    Setup Environment Variables

    Buat file .env:

    # MLflow Tracking
    

    MLFLOWTRACKINGURI=http://localhost:5000

    S3/MinIO Configuration

    AWSACCESSKEYID=minioadmin

    AWSSECRETACCESSKEY=minioadmin

    MLFLOWS3ENDPOINTURL=http://localhost:9000

    MLflow Tracking: Experiment Tracking

    Basic Tracking

    import mlflow
    

    import mlflow.sklearn

    from sklearn.ensemble import RandomForestClassifier

    from sklearn.datasets import loadiris

    from sklearn.modelselection import traintestsplit

    from sklearn.metrics import accuracyscore, f1score

    Set tracking URI

    mlflow.settrackinguri("http://localhost:5000")

    Set experiment

    mlflow.setexperiment("iris-classification")

    Load data

    iris = loadiris()

    Xtrain, Xtest, ytrain, ytest = traintestsplit(

    iris.data, iris.target, testsize=0.2, randomstate=42

    )

    Start MLflow run

    with mlflow.startrun(runname="random-forest-v1") as run:

    # Log parameters

    params = {

    "nestimators": 100,

    "maxdepth": 5,

    "randomstate": 42

    }

    mlflow.logparams(params)

    # Train model

    model = RandomForestClassifier(params)

    model.fit(Xtrain, ytrain)

    # Make predictions

    ypred = model.predict(Xtest)

    # Log metrics

    metrics = {

    "accuracy": accuracyscore(ytest, ypred),

    "f1score": f1score(ytest, ypred, average="weighted")

    }

    mlflow.logmetrics(metrics)

    # Log model

    mlflow.sklearn.logmodel(

    model,

    "model",

    Artikel Terkait