Tutorial Lengkap Evidently AI: ML Model Monitoring dan Data Quality
Evidently adalah library Python open-source untuk mengevaluasi, testing, dan monitoring model machine learning di production. Library ini membantu mendeteksi data drift, degradasi model, dan masalah kualitas data sebelum berdampak pada bisnis Anda.
Mengapa Evidently?
Keunggulan Evidently:- Data drift detection: Monitor perubahan input data
- Model performance tracking: Track metrics seiring waktu
- Visual reports: Dashboard HTML interaktif
- Test suites: Automated quality checks
- Easy integration: Works dengan ML framework apapun
- Production model monitoring
- Data quality validation
- A/B testing analysis
- Pre-deployment validation
- Debugging masalah model
Instalasi
# Basic installation
pip install evidently
Dengan visualization support
pip install evidently[notebooks]
Verify installation
python -c "import evidently; print(evidently.version)"
Quick Start
1. Basic Data Drift Report
import pandas as pd
from evidently.report import Report
from evidently.metricpreset import DataDriftPreset
Load reference (training) dan current (production) data
referencedata = pd.readcsv("trainingdata.csv")
currentdata = pd.readcsv("productiondata.csv")
Create report
report = Report(metrics=[DataDriftPreset()])
Run analysis
report.run(
referencedata=referencedata,
currentdata=currentdata
)
Save sebagai HTML
report.savehtml("driftreport.html")
Get results sebagai dict
results = report.asdict()
print(f"Dataset drift detected: {results['metrics'][0]['result']['datasetdrift']}")
2. Model Performance Report
from evidently.report import Report
from evidently.metricpreset import ClassificationPreset
Data dengan predictions dan labels
data = pd.DataFrame({
"feature1": [1.0, 2.0, 3.0, 4.0, 5.0],
"feature2": [0.5, 1.5, 2.5, 3.5, 4.5],
"prediction": [0, 1, 1, 0, 1],
"target": [0, 1, 0, 0, 1],
})
Classification report
report = Report(metrics=[ClassificationPreset()])
report.run(currentdata=data, columnmapping={
"target": "target",
"prediction": "prediction"
})
report.savehtml("classificationreport.html")
Metric Presets
1. Data Drift Preset
from evidently.report import Report
from evidently.metricpreset import DataDriftPreset
report = Report(metrics=[
DataDriftPreset(
columns=["feature1", "feature2", "feature3"], # Specific columns
driftshare=0.5, # Threshold untuk dataset drift
)
])
report.run(referencedata=refdf, currentdata=currdf)
2. Data Quality Preset
from evidently.metricpreset import DataQualityPreset
report = Report(metrics=[DataQualityPreset()])
report.run(current
data=data)
Check untuk:
- Missing values
- Duplicates
- Constant columns
- Empty columns
- New/missing categories
3. Target Drift Preset
from evidently.metricpreset import TargetDriftPreset
report = Report(metrics=[TargetDriftPreset()])
report.run(
reference
data=refdf,
current
data=currdf,
column
mapping={"target": "label"}
)
4. Classification Preset
from evidently.metricpreset import ClassificationPreset
report = Report(metrics=[ClassificationPreset()])
report.run(
current
data=data,
columnmapping={
"target": "actual",
"prediction": "predicted",
"poslabel": 1 # Untuk binary classification
}
)
5. Regression Preset
from evidently.metricpreset import RegressionPreset
report = Report(metrics=[RegressionPreset()])
report.run(