SHAP - A Practical Guide to Explainable AI and Model Interpretability
Machine learning models increasingly drive decisions that affect people: loan approvals, medical triage, fraud flags, and churn interventions. When a model says "no," stakeholders reasonably ask why. SHAP (SHapley Additive exPlanations) gives a principled, consistent answer by attributing each prediction to the features that produced it.
This tutorial is a dedicated deep dive into SHAP: the theory behind it, the modern API, the right explainer for each model type, every core plot and how to read it, and the pitfalls that trip up teams in production.
Table of Contents
Why Interpretability Matters
A model that scores well on a held-out set is not automatically trustworthy. Interpretability addresses four concrete needs that show up in real projects.
- Trust and adoption. Domain experts adopt a model faster when they can see that it relies on sensible signals rather than spurious correlations. A credit analyst is more comfortable acting on a score when the drivers match their mental model.
- Debugging. Explanations expose leakage and shortcut learning. If a "future" timestamp column dominates a fraud model, the explanation surfaces it long before a postmortem does.
- Fairness. By inspecting how protected or proxy attributes contribute to predictions, teams can detect and quantify undesirable behavior.
- Regulation. Frameworks such as the EU's GDPR right-to-explanation, the EU AI Act, and financial supervisory guidance increasingly expect documented, reproducible reasoning behind automated decisions.
SHAP does not make a model fair or correct on its own. It is a measurement tool that tells you, faithfully, what the model is doing so you can decide whether that behavior is acceptable.
Global vs Local Explanations
Interpretability questions come in two flavors, and SHAP answers both with the same underlying quantity.
- Local explanation: "Why did the model produce this prediction for this customer?" The answer is the per-feature contribution for a single row.
- Global explanation: "Which features matter most across the whole dataset, and in which direction?" The answer is obtained by aggregating local explanations over many rows.
A useful property of SHAP is that the global view is literally a summary of local views. Mean absolute SHAP value per feature gives global importance; the sign and spread of per-row values describe direction and heterogeneity. There is no separate, inconsistent "global importance" metric to reconcile.
The Intuition Behind Shapley Values
Shapley values come from cooperative game theory (Lloyd Shapley, 1953). Imagine a game where several players cooperate to produce a payout, and you need to split that payout fairly. The Shapley value of a player is their average marginal contribution across every possible order in which players could join the coalition.
Map this onto a prediction: the "players" are the input features, the "payout" is the model's output for a given row, and a feature's Shapley value is how much, on average, including that feature changes the prediction as features are added in every possible order.
The reason this matters is that Shapley values are the unique attribution method satisfying a set of fairness axioms simultaneously: