Debate (AI alignment)

GPTKB entity

Statements (27)
Predicate Object
gptkbp:instanceOf gptkb:AI_alignment_technique
gptkbp:benefit Improved AI transparency
Reduced risk of deceptive AI behavior
gptkbp:category gptkb:AI_safety_proposal
AI alignment research
gptkbp:challenge Reliance on human judgment
Scalability to complex questions
Vulnerability to manipulation
gptkbp:describedBy AI safety literature
gptkbp:goal Align AI behavior with human values
gptkbp:inspiredBy Human debate
gptkbp:method Two or more AIs debate a question
Human judges select the most convincing debater
gptkbp:notablePublication AI Safety via Debate (Irving et al., 2018)
gptkbp:proposedBy gptkb:Paul_Christiano
gptkb:Geoffrey_Irving
2018
gptkbp:relatedTo Machine learning
AI safety
Interpretability
Truthfulness in AI
gptkbp:studiedBy gptkb:DeepMind
gptkb:OpenAI
gptkb:Anthropic
gptkbp:bfsParent gptkb:Iterated_Amplification
gptkbp:bfsLayer 7
https://www.w3.org/2000/01/rdf-schema#label Debate (AI alignment)