Unit 2 of 4

5.2 — Red-Teaming Fundamentals

AI red-teaming is structured adversarial testing designed to find failures, vulnerabilities, and harmful behaviors in AI systems before deployment. It goes beyond standard testing by actively trying to make the system fail.

Red-Team Process Flow

Scope

Define targets & boundaries

→

Plan

Select attack taxonomy

→

Execute

Run adversarial tests

→

Report

Document findings

→

Remediate

Fix & retest

Red-Team Attack Dimensions

Dimension	Description	Example Attacks
Safety	Can the system produce harmful content?	Generating instructions for dangerous activities, self-harm content
Security	Can the system be exploited?	Prompt injection, jailbreaking, data extraction, system prompt leakage
Fairness	Does it behave differently across groups?	Demographic bias in outputs, stereotyping, differential quality
Reliability	How does it handle edge cases?	Out-of-distribution inputs, adversarial perturbations, ambiguous queries
Factuality	Does it generate false information?	Hallucinations, confabulation, citation fabrication, date errors

Red-Teaming Methodologies

Method

Approach

Best For

Manual

Human testers craft adversarial inputs

Creative, novel attack discovery

Automated

AI-assisted generation of adversarial prompts

Scale and coverage

Structured

Following predefined attack taxonomies

Systematic, reproducible assessment

Domain-Expert

Domain specialists test for domain-specific risks

High-stakes or regulated domains

★EXAM TIP

Red-team reports must document: the attack taxonomy used, specific prompts/inputs that caused failures, severity classification of each failure, reproducibility information, and recommended mitigations. Results should be shared with development teams before public disclosure (responsible disclosure).

Key Points

Red-teaming: structured adversarial testing before deployment

Scope: safety, security, fairness, reliability, factuality

Manual + automated + structured + domain-expert approaches

Reports: attack taxonomy, failure severity, mitigations

Responsible disclosure practices

← Previous unit Next unit →