MLflow vs Neptune.ai: Complete Guide to Experiment Tracking for MLOps
Experiment tracking is a crucial component in MLOps that enables data science teams to track, compare, and reproduce machine learning experiments. In this tutorial, we'll compare two popular platforms: MLflow (open-source) and Neptune.ai (managed service), and learn how to use both.
Why is Experiment Tracking Important?
Without proper experiment tracking, ML teams often face:
- Reproducibility crisis: Unable to reproduce previous experiment results
- Lost experiments: Losing configurations that produced the best model
- Collaboration issues: Difficult to share results across teams
- Technical debt: Spreadsheets and manual notes that don't scale
Overview: MLflow vs Neptune.ai
| Aspect | MLflow | Neptune.ai |
|--------|--------|------------|
| Type | Open-source | Managed SaaS |
| Hosting | Self-hosted / Managed | Cloud-hosted |
| Pricing | Free (infra cost) | Free tier + paid plans |
| Setup | Manual setup | Instant |
| UI | Basic | Advanced |
| Collaboration | Limited | Built-in |
| Integrations | 15+ frameworks | 25+ frameworks |
| Model Registry | Yes | Yes |
| Best For | Full control, on-prem | Quick start, teams |
Part 1: MLflow
1.1 Installing MLflow
# Install MLflow
pip install mlflow
For tracking server with database backend
pip install mlflow[extras]
Start tracking server (local)
mlflow ui --port 5000
Or with backend store
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 0.0.0.0 \
--port 5000
1.2 Basic Experiment Tracking
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import loadiris
from sklearn.modelselection import traintestsplit
from sklearn.metrics import accuracyscore, f1score
Set tracking URI (optional, default: ./mlruns)
mlflow.settrackinguri("http://localhost:5000")
Set experiment name
mlflow.setexperiment("iris-classification")
Load data
X, y = loadiris(returnXy=True)
Xtrain, Xtest, ytrain, ytest = traintestsplit(X, y, testsize=0.2)
Start run
with mlflow.startrun(runname="random-forest-v1"):
# Log parameters
params = {
"nestimators": 100,
"maxdepth": 5,
"randomstate": 42
}
mlflow.logparams(params)
# Train model
model = RandomForestClassifier(*params)
model.fit(Xtrain, ytrain)
# Predict and evaluate
ypred = model.predict(Xtest)
accuracy = accuracyscore(ytest, ypred)
f1 = f1score(ytest, ypred, average='weighted')
# Log metrics
mlflow.logmetrics({
"accuracy": accuracy,
"f1score": f1
})
# Log model
mlflow.sklearn.logmodel(model, "model")
# Log artifacts (additional files)
with open("featureimportance.txt", "w") as f:
for name, importance in zip(loadiris().featurenames, model.featureimportances):
f.write(f"{name}: {importance:.4f}\n")
mlflow.logartifact("featureimportance.txt")
print(f"Run ID: {mlflow.activerun().info.runid}")
print(f"Accuracy: {accuracy:.4f}")
1.3 Hyperparameter Tuning with MLflow
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.modelselection import crossvalscore
from sklearn.datasets import loadiris
import itertools
mlflow.setexperiment("iris-hyperparameter-tuning")
X, y = loadiris(returnXy=True)
Hyperparameter grid
paramgrid = {
"nestimators": [50, 100, 200],
"maxdepth": [3, 5, 10, None],
"minsamplessplit": [2, 5, 10]
}