Red Teaming: Definition & Meaning — AI Wiki

The practice of deliberately trying to make an AI model fail, misbehave, or produce harmful outputs. Red teams probe for vulnerabilities: jailbreaks, bias, misinformation generation, privacy leaks. Named after military wargaming where a "red team" plays the adversary.

Why it matters

You can't fix what you don't know about. Red teaming is how providers discover that their model will explain how to pick locks if you ask it to "write a story about a locksmith." It's essential safety work that happens before every major model release.

Red Teaming

Why it matters

Related Concepts