Model Cards & Red-Teaming
Learn to create comprehensive model documentation (model cards, datasheets, system cards) and conduct structured adversarial testing (red-teaming) of AI systems. Covers evaluation methodologies, bias detection, and safety testing.
Show Answer
Model Details, Intended Use, Factors, Metrics, Evaluation Data, Training Data, Ethical Considerations, and Caveats/Recommendations.
Show Answer
Model cards document a trained ML model in isolation. System cards document the complete AI system pipeline: model + prompts + guardrails + post-processing + deployment context.
Show Answer
Demographic Parity (equal positive prediction rates — use when equal representation is the goal), Equalized Odds (equal TPR and FPR — use when accuracy across groups matters), Predictive Parity (equal precision — use when confidence in positive predictions must be equal).
Show Answer
Direct injection (adversarial content in user input), indirect injection (adversarial content in retrieved documents/web pages), and system prompt extraction (attempting to reveal system instructions).
Show Answer
It is mathematically impossible to satisfy Demographic Parity, Equalized Odds, and Predictive Parity simultaneously except in trivial cases (perfect prediction or equal base rates). This is known as the 'impossibility theorem' of fairness. Context determines which metric to prioritize.
Show Answer
(1) Define protected attributes and groups, (2) Select appropriate fairness metrics, (3) Collect disaggregated evaluation data, (4) Compute metrics per group, (5) Compare against thresholds, (6) Investigate root causes of disparities, (7) Document findings and recommend mitigations.
Show Answer
Intersectional analysis examines bias across combinations of protected attributes (e.g., race x gender x age) rather than single attributes alone. It's important because disparities can be hidden in single-attribute analyses — a system may appear fair for each attribute individually but show significant bias for specific intersectional subgroups.
Show Answer
Indirect injection embeds adversarial instructions in external data sources (documents, web pages) that the RAG system retrieves. Because the system trusts retrieved content as factual context, it may follow malicious instructions embedded within. This is harder to detect than direct injection and can compromise system integrity without the user's knowledge.