Examination of Large Language Model "red-teaming" defines it as a non-malicious team-effort activity to seek LLMs' limits and identifies 35 different techniques used to test them
Article URL : https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0314658
Article title: Summon a demon and bind it: A grounded theory of LLM red teaming
Author countries: US, Denmark
Funding: VILLUM Foundation, grant No. 37176: ATTiKA: Adaptive Tools for Technical Knowledge Acquisition. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
PLOS ONE
Summon a demon and bind it: A grounded theory of LLM red teaming
15-Jan-2025
The authors have declared that no competing interests exist.