MinIO: Self-Hosted Object Storage for ML Pipelines
MinIO is an open-source object storage solution fully compatible with the Amazon S3 API. In the machine learning ecosystem, MinIO has become a popular choice for storing datasets, model artifacts, and experiment results due to its high performance, easy deployment, and full compatibility with ML tools such as MLflow, PyTorch, and TensorFlow.
This article covers how to use MinIO as a storage foundation for ML pipelines, from installation to building a model registry and dataset versioning system.
Why MinIO for ML?
In machine learning workflows, we frequently deal with large files: image datasets, model weights, training checkpoints, and experiment logs. Cloud object storage like Amazon S3 is the standard solution, but costs can escalate quickly and there is vendor lock-in.
MinIO provides a compelling alternative:
- S3-compatible: All tools that support S3 automatically work with MinIO
- Self-hosted: Data stays on your own infrastructure, suitable for data sovereignty regulations
- High performance: Optimized for high throughput on modern hardware
- Free and open-source: No licensing costs for community usage
Installing MinIO
Installation with Docker
The easiest way to get started with MinIO is using Docker:
docker run -d \
--name minio \
-p 9000:9000 \
-p 9001:9001 \
-v minio-data:/data \
-e "MINIOROOTUSER=minioadmin" \
-e "MINIOROOTPASSWORD=minioadmin123" \
quay.io/minio/minio server /data --console-address ":9001"
For a more structured setup, use Docker Compose:
version: '3.8'
services:
minio:
image: quay.io/minio/minio
containername: minio
ports:
- "9000:9000"
- "9001:9001"
environment:
MINIOROOTUSER: minioadmin
MINIOROOTPASSWORD: minioadmin123
volumes:
- minio-data:/data
command: server /data --console-address ":9001"
healthcheck:
test: ["CMD", "mc", "ready", "local"]
interval: 30s
timeout: 20s
retries: 3
volumes:
minio-data:
Run it with:
docker-compose up -d
Binary Installation
For direct installation on a Linux server:
wget https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio
sudo mv minio /usr/local/bin/
Create data directory
sudo mkdir -p /data/minio
Run MinIO
MINIOROOTUSER=minioadmin MINIOROOTPASSWORD=minioadmin123 \
minio server /data/minio --console-address ":9001"
To run MinIO as a systemd service:
[Unit]
Description=MinIO Object Storage
After=network.target
[Service]
Type=simple
User=minio-user
Group=minio-user
Environment="MINIOROOTUSER=minioadmin"
Environment="MINIOROOTPASSWORD=minioadmin123"
ExecStart=/usr/local/bin/minio server /data/minio --console-address ":9001"
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable minio
sudo systemctl start minio
MinIO Console UI
Once MinIO is running, access the Console UI at http://localhost:9001. The Console provides a visual interface for:
- Creating and managing buckets
- Uploading and downloading objects
- Setting access policies
- Monitoring performance and storage usage
- Configuring event notifications
- Managing users and groups
Log in with the configured credentials (default: minioadmin / minioadmin123).
The Console UI is particularly useful for data science teams who are not comfortable with the command line. They can directly browse datasets, preview files, and manage buckets without additional tools.
MinIO Client (mc)
MinIO provides the mc command-line tool for administration:
# Install mc
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/
Configure alias
mc alias set local http://localhost:9000 minioadmin minioadmin123