5.3 — Bias Detection and Fairness Testing
Bias in AI systems can arise from training data (representation bias, measurement bias, historical bias), model architecture choices, labeling processes, and evaluation methodology. Auditors must understand each source.
| Metric | Definition | When to Use |
|---|---|---|
| Demographic Parity | Equal positive prediction rates across groups | When equal representation in outcomes is the primary goal |
| Equalized Odds | Equal true positive and false positive rates across groups | When accuracy across groups matters (e.g., medical diagnosis) |
| Predictive Parity | Equal precision (PPV) across groups | When confidence in positive predictions must be equal |
| Individual Fairness | Similar individuals receive similar outcomes | When individual-level treatment consistency matters |
| Calibration | Predicted probabilities match actual outcomes per group | When probability estimates are used for downstream decisions |
It is mathematically impossible to satisfy Demographic Parity, Equalized Odds, and Predictive Parity simultaneously (except in trivial cases). Choosing the right metric depends on the context, legal requirements, and stakeholder priorities. Document the rationale for your metric choice.
Identify relevant demographic groups (race, gender, age, disability, etc.) based on context and legal requirements.
Choose appropriate metrics for the context — consider legal, ethical, and stakeholder requirements.
Gather evaluation data with demographic labels for each group of interest.
Calculate selected fairness metrics separately for each demographic group.
Evaluate whether disparities exceed acceptable thresholds (e.g., 80% rule / four-fifths rule).
Trace disparities back to training data, features, model architecture, or labeling processes.
Record findings, rationale, and specific mitigation recommendations.
Intersectional analysis examines bias across combinations of protected attributes (e.g., race x gender x age) rather than single attributes alone. This can reveal disparities hidden in single-attribute analyses. Always test intersectionally.