Unit 2 of 4

5.2 — Red-Teaming Fundamentals

AI red-teaming is structured adversarial testing designed to find failures, vulnerabilities, and harmful behaviors in AI systems before deployment. It goes beyond standard testing by actively trying to make the system fail.

Red-Team Process Flow
Scope
Define targets & boundaries
Plan
Select attack taxonomy
Execute
Run adversarial tests
Report
Document findings
Remediate
Fix & retest
Red-Team Attack Dimensions
DimensionDescriptionExample Attacks
SafetyCan the system produce harmful content?Generating instructions for dangerous activities, self-harm content
SecurityCan the system be exploited?Prompt injection, jailbreaking, data extraction, system prompt leakage
FairnessDoes it behave differently across groups?Demographic bias in outputs, stereotyping, differential quality
ReliabilityHow does it handle edge cases?Out-of-distribution inputs, adversarial perturbations, ambiguous queries
FactualityDoes it generate false information?Hallucinations, confabulation, citation fabrication, date errors
Red-Teaming Methodologies
Method
Approach
Best For
Manual
Human testers craft adversarial inputs
Creative, novel attack discovery
Automated
AI-assisted generation of adversarial prompts
Scale and coverage
Structured
Following predefined attack taxonomies
Systematic, reproducible assessment
Domain-Expert
Domain specialists test for domain-specific risks
High-stakes or regulated domains
EXAM TIP

Red-team reports must document: the attack taxonomy used, specific prompts/inputs that caused failures, severity classification of each failure, reproducibility information, and recommended mitigations. Results should be shared with development teams before public disclosure (responsible disclosure).

Key Points
Red-teaming: structured adversarial testing before deployment
Scope: safety, security, fairness, reliability, factuality
Manual + automated + structured + domain-expert approaches
Reports: attack taxonomy, failure severity, mitigations
Responsible disclosure practices
← Previous unitNext unit →