PSI + Drift Visualization Explainer

1. Distribution Comparison

Reference (training) vs. Current (production) feature distributions. Move the slider to simulate drift.

Reference Distribution Current Distribution

Drift Amount: 0%

2. Step-by-Step PSI Calculation

Watch each bin's contribution update live as drift changes.

PSI Formula (per bin) PSI_i = ( P_current,i − P_reference,i ) × ln( P_current,i / P_reference,i )

1

Divide the feature range into 10 equal-width bins spanning the data.

2

Compute the proportion of observations in each bin for both distributions.

3

For each bin, calculate the PSI contribution using the formula above, then sum across all bins.

Bin	Range	Ref %	Curr %	Diff	ln(C/R)	PSI Contrib

Tip: Rows highlighted in amber are the top contributors to total PSI. These bins are where the distributions diverge the most.

3. PSI Interpretation

Industry-standard thresholds for acting on detected drift.

0.0000

< 0.1

0.1 – 0.2

≥ 0.2

0.00 0.10 0.20 0.30+

Distributions are identical — no drift detected.

Threshold Reference

PSI < 0.1 — Stable

No significant population shift. The model's input distribution is consistent with training data. Continue standard monitoring.

0.1 ≤ PSI < 0.2 — Investigate

Moderate drift detected. Examine which features shifted and whether model performance metrics have degraded. May need retraining soon.

PSI ≥ 0.2 — Action Required

Significant population change. The model is likely seeing data it was not trained on. Retrain, recalibrate, or investigate root cause immediately.

Context Matters

These thresholds are guidelines, not absolute rules. High-stakes domains (e.g. healthcare, finance) may use stricter thresholds like 0.05 and 0.1.

4. Key Concepts

Essential background for understanding PSI and data drift in production ML.

What Is PSI and Why It Matters

The Population Stability Index (PSI) is a symmetric metric that quantifies how much a variable's distribution has shifted between two time periods. Originally developed in credit scoring, it is now a standard tool for monitoring ML features and predictions in production.

Unlike raw accuracy, PSI is an input-side check — it detects distribution shifts before they degrade model performance. This makes it ideal for early warning systems in MLOps pipelines, where ground truth labels may arrive with a significant delay.

Key insight: PSI measures distributional divergence, not prediction error. A high PSI tells you the data has changed, which is a leading indicator that model quality may soon follow.

How Binning Works

Equal-Width Bins

Divide the feature range into N bins of equal width. Simple and interpretable, but sparse tails may produce empty bins. This is the method used in the visualization above.

Quantile (Equal-Frequency) Bins

Define bin edges so each bin contains roughly the same number of reference observations. Better tail coverage, commonly used in practice with N = 10 (deciles).

Common Pitfalls

Empty Bins / Zero Proportions

If a bin has zero observations in either distribution, the ln(C/R) term becomes undefined (division by zero or log of zero). Common fixes:

Replace zeros with a small epsilon (e.g. 0.0001)
Merge sparse bins with neighbors
Use quantile binning to avoid empty bins

Small Sample Sizes

PSI can be noisy with small samples. Random fluctuations may trigger false alarms. Best practices:

Use at least 200–500+ observations per window
Average PSI over rolling windows
Complement PSI with statistical tests (e.g. KS test)

Number of Bins

Too few bins smooth out real drift; too many create noise. The industry standard is 10 bins, but for high-cardinality features, 15–20 can capture finer shifts. Always validate with visual inspection.

Categorical Features

For categorical variables, each unique category acts as its own "bin." New categories in production (unseen during training) are strong drift signals that PSI will capture as a large spike.

PSI in Production Monitoring

In a mature MLOps setup, PSI is computed on a schedule (hourly, daily, or per batch) for every input feature and the model's output score distribution:

Typical Monitoring Pipeline

                        1. Store reference distributions (training data profiles) as bin counts

                        2. On each scoring batch, compute current distributions

                        3. Calculate PSI for each feature + the prediction score

                        4. Log PSI values to a monitoring dashboard (e.g. Grafana, MLflow)

                        5. Alert if any feature crosses the 0.1 or 0.2 threshold

                        6. Investigate root cause: upstream data pipeline change? Seasonal trend? Concept drift?

Remember: PSI detects data drift (input changes), not concept drift (relationship changes between features and target). A stable PSI does not guarantee the model is still accurate — always pair with performance monitoring when labels are available.