Complete Evidently AI Tutorial: ML Model Monitoring and Data Quality
Evidently is an open-source Python library for evaluating, testing, and monitoring machine learning models in production. It helps detect data drift, model degradation, and data quality issues before they impact your business.
Why Evidently?
Evidently Advantages:- Data drift detection: Monitor input data changes
- Model performance tracking: Track metrics over time
- Visual reports: Interactive HTML dashboards
- Test suites: Automated quality checks
- Easy integration: Works with any ML framework
- Production model monitoring
- Data quality validation
- A/B testing analysis
- Pre-deployment validation
- Debugging model issues
Installation
# Basic installation
pip install evidently
With visualization support
pip install evidently[notebooks]
Verify installation
python -c "import evidently; print(evidently.version)"
Quick Start
1. Basic Data Drift Report
import pandas as pd
from evidently.report import Report
from evidently.metricpreset import DataDriftPreset
Load reference (training) and current (production) data
referencedata = pd.readcsv("trainingdata.csv")
currentdata = pd.readcsv("productiondata.csv")
Create report
report = Report(metrics=[DataDriftPreset()])
Run analysis
report.run(
referencedata=referencedata,
currentdata=currentdata
)
Save as HTML
report.savehtml("driftreport.html")
Get results as dict
results = report.asdict()
print(f"Dataset drift detected: {results['metrics'][0]['result']['datasetdrift']}")
2. Model Performance Report
from evidently.report import Report
from evidently.metricpreset import ClassificationPreset
Data with predictions and labels
data = pd.DataFrame({
"feature1": [1.0, 2.0, 3.0, 4.0, 5.0],
"feature2": [0.5, 1.5, 2.5, 3.5, 4.5],
"prediction": [0, 1, 1, 0, 1],
"target": [0, 1, 0, 0, 1],
})
Classification report
report = Report(metrics=[ClassificationPreset()])
report.run(currentdata=data, columnmapping={
"target": "target",
"prediction": "prediction"
})
report.savehtml("classificationreport.html")
Metric Presets
1. Data Drift Preset
from evidently.report import Report
from evidently.metricpreset import DataDriftPreset
report = Report(metrics=[
DataDriftPreset(
columns=["feature1", "feature2", "feature3"], # Specific columns
driftshare=0.5, # Threshold for dataset drift
)
])
report.run(referencedata=refdf, currentdata=currdf)
2. Data Quality Preset
from evidently.metricpreset import DataQualityPreset
report = Report(metrics=[DataQualityPreset()])
report.run(current
data=data)
Check for:
- Missing values
- Duplicates
- Constant columns
- Empty columns
- New/missing categories
3. Target Drift Preset
from evidently.metricpreset import TargetDriftPreset
report = Report(metrics=[TargetDriftPreset()])
report.run(
reference
data=refdf,
current
data=currdf,
column
mapping={"target": "label"}
)
4. Classification Preset
from evidently.metricpreset import ClassificationPreset
report = Report(metrics=[ClassificationPreset()])
report.run(
current
data=data,
columnmapping={
"target": "actual",
"prediction": "predicted",
"poslabel": 1 # For binary classification
}
)
5. Regression Preset
from evidently.metricpreset import RegressionPreset
report = Report(metrics=[RegressionPreset()])
report.run(
current
data=data,