MODULE 05

Model Cards & Red-Teaming

Learn to create comprehensive model documentation (model cards, datasheets, system cards) and conduct structured adversarial testing (red-teaming) of AI systems. Covers evaluation methodologies, bias detection, and safety testing.

4
Units
~2.5 hrs
Duration
~375 min
Per unit
8
Practice Qs
Learning objectives
After completing this module, you will be able to:
Model Cards: standardized model documentation (Mitchell 2019)
Red-teaming: structured adversarial testing before deployment
Bias sources: data, architecture, labeling, evaluation
Standard benchmarks: TruthfulQA, BBQ, RealToxicityPrompts, HELM
0 of 4 units completed0%
Start learning
In this module
5.1 — Model Cards
5.2 — Red-Teaming Fundamentals
5.3 — Bias Detection and Fairness Testing
5.4 — Safety Evaluation and Benchmarks
Practice questions
Q1: What are the key sections of a model card?
Show Answer

Model Details, Intended Use, Factors, Metrics, Evaluation Data, Training Data, Ethical Considerations, and Caveats/Recommendations.

Q2: What is the difference between model cards and system cards?
Show Answer

Model cards document a trained ML model in isolation. System cards document the complete AI system pipeline: model + prompts + guardrails + post-processing + deployment context.

Q3: Name three fairness metrics and when you might choose each.
Show Answer

Demographic Parity (equal positive prediction rates — use when equal representation is the goal), Equalized Odds (equal TPR and FPR — use when accuracy across groups matters), Predictive Parity (equal precision — use when confidence in positive predictions must be equal).

Q4: What are the three types of prompt injection attacks?
Show Answer

Direct injection (adversarial content in user input), indirect injection (adversarial content in retrieved documents/web pages), and system prompt extraction (attempting to reveal system instructions).

Q5: Why can't you satisfy all fairness metrics simultaneously?
Show Answer

It is mathematically impossible to satisfy Demographic Parity, Equalized Odds, and Predictive Parity simultaneously except in trivial cases (perfect prediction or equal base rates). This is known as the 'impossibility theorem' of fairness. Context determines which metric to prioritize.

Q6: Describe the 7-step bias testing process.
Show Answer

(1) Define protected attributes and groups, (2) Select appropriate fairness metrics, (3) Collect disaggregated evaluation data, (4) Compute metrics per group, (5) Compare against thresholds, (6) Investigate root causes of disparities, (7) Document findings and recommend mitigations.

Q7: What is intersectional bias analysis and why is it important?
Show Answer

Intersectional analysis examines bias across combinations of protected attributes (e.g., race x gender x age) rather than single attributes alone. It's important because disparities can be hidden in single-attribute analyses — a system may appear fair for each attribute individually but show significant bias for specific intersectional subgroups.

Q8: Why must indirect prompt injection be tested in RAG-based systems?
Show Answer

Indirect injection embeds adversarial instructions in external data sources (documents, web pages) that the RAG system retrieves. Because the system trusts retrieved content as factual context, it may follow malicious instructions embedded within. This is harder to detect than direct injection and can compromise system integrity without the user's knowledge.

04. India DPDP Act + RBI AI/ML Guidelines06. Audit Documentation & Governance