MinIO: Self-Hosted Object Storage for ML Pipelines

# MinIO: Object Storage Self-Hosted untuk Pipeline ML MinIO adalah solusi object storage open-source yang kompatibel dengan Amazon S3 API. Dalam ekosistem machine learning, MinIO menjadi pilihan popu...

By Ruby Abdullah · · tutorial
MinIOObject StorageS3Data LakePython

MinIO: Self-Hosted Object Storage for ML Pipelines

MinIO is an open-source object storage solution fully compatible with the Amazon S3 API. In the machine learning ecosystem, MinIO has become a popular choice for storing datasets, model artifacts, and experiment results due to its high performance, easy deployment, and full compatibility with ML tools such as MLflow, PyTorch, and TensorFlow.

This article covers how to use MinIO as a storage foundation for ML pipelines, from installation to building a model registry and dataset versioning system.

Why MinIO for ML?

In machine learning workflows, we frequently deal with large files: image datasets, model weights, training checkpoints, and experiment logs. Cloud object storage like Amazon S3 is the standard solution, but costs can escalate quickly and there is vendor lock-in.

MinIO provides a compelling alternative:

  • S3-compatible: All tools that support S3 automatically work with MinIO
  • Self-hosted: Data stays on your own infrastructure, suitable for data sovereignty regulations
  • High performance: Optimized for high throughput on modern hardware
  • Free and open-source: No licensing costs for community usage

Installing MinIO

Installation with Docker

The easiest way to get started with MinIO is using Docker:

docker run -d \

--name minio \

-p 9000:9000 \

-p 9001:9001 \

-v minio-data:/data \

-e "MINIOROOTUSER=minioadmin" \

-e "MINIOROOTPASSWORD=minioadmin123" \

quay.io/minio/minio server /data --console-address ":9001"

For a more structured setup, use Docker Compose:

version: '3.8'

services:

minio:

image: quay.io/minio/minio

containername: minio

ports:

  • "9000:9000"
  • "9001:9001"
environment:

MINIOROOTUSER: minioadmin

MINIOROOTPASSWORD: minioadmin123

volumes:

  • minio-data:/data
command: server /data --console-address ":9001"

healthcheck:

test: ["CMD", "mc", "ready", "local"]

interval: 30s

timeout: 20s

retries: 3

volumes:

minio-data:

Run it with:

docker-compose up -d

Binary Installation

For direct installation on a Linux server:

wget https://dl.min.io/server/minio/release/linux-amd64/minio

chmod +x minio

sudo mv minio /usr/local/bin/

Create data directory

sudo mkdir -p /data/minio

Run MinIO

MINIOROOTUSER=minioadmin MINIOROOTPASSWORD=minioadmin123 \

minio server /data/minio --console-address ":9001"

To run MinIO as a systemd service:

[Unit]

Description=MinIO Object Storage

After=network.target

[Service]

Type=simple

User=minio-user

Group=minio-user

Environment="MINIOROOTUSER=minioadmin"

Environment="MINIOROOTPASSWORD=minioadmin123"

ExecStart=/usr/local/bin/minio server /data/minio --console-address ":9001"

Restart=always

RestartSec=5

[Install]

WantedBy=multi-user.target

sudo systemctl daemon-reload

sudo systemctl enable minio

sudo systemctl start minio

MinIO Console UI

Once MinIO is running, access the Console UI at http://localhost:9001. The Console provides a visual interface for:

  • Creating and managing buckets
  • Uploading and downloading objects
  • Setting access policies
  • Monitoring performance and storage usage
  • Configuring event notifications
  • Managing users and groups

Log in with the configured credentials (default: minioadmin / minioadmin123).

The Console UI is particularly useful for data science teams who are not comfortable with the command line. They can directly browse datasets, preview files, and manage buckets without additional tools.

MinIO Client (mc)

MinIO provides the mc command-line tool for administration:

# Install mc

wget https://dl.min.io/client/mc/release/linux-amd64/mc

chmod +x mc

sudo mv mc /usr/local/bin/

Configure alias

mc alias set local http://localhost:9000 minioadmin minioadmin123

Related Articles

Reflex Tutorial: Building Full-Stack Web Apps in Pure Python

Reflex: Membangun Aplikasi Web Full-Stack dengan Python Murni Reflex memungkinkan Anda membangun aplikasi web lengkap — ...

ColBERT & RAGatouille Tutorial: Late-Interaction Retrieval for RAG

ColBERT & RAGatouille: Retrieval Late-Interaction untuk RAG yang Lebih Baik Sebagian besar sistem RAG mengandalkan dense...

SGLang Tutorial: Fast LLM Serving and Structured Generation

SGLang: Serving LLM yang Cepat dan Model Pemrograman untuk Generasi Terstruktur SGLang adalah dua hal dalam satu paket: ...

TRL Tutorial: LLM Post-Training with SFT, DPO, and Reward Modeling

Post-Training LLM dengan TRL: SFT, Reward Modeling, dan DPO Setelah sebuah base language model selesai dipretraining, mo...