Debate (AI alignment)

GPTKB entity

Statements (27)
Predicate Object
gptkbp:instanceOf AI alignment technique
gptkbp:benefit Improved AI transparency
Reduced risk of deceptive AI behavior
gptkbp:category AI alignment research
AI safety proposal
gptkbp:challenge Reliance on human judgment
Scalability to complex questions
Vulnerability to manipulation
gptkbp:describedBy AI safety literature
gptkbp:goal Align AI behavior with human values
https://www.w3.org/2000/01/rdf-schema#label Debate (AI alignment)
gptkbp:inspiredBy Human debate
gptkbp:method Two or more AIs debate a question
Human judges select the most convincing debater
gptkbp:notablePublication AI Safety via Debate (Irving et al., 2018)
gptkbp:proposedBy gptkb:Paul_Christiano
gptkb:Geoffrey_Irving
2018
gptkbp:relatedTo Machine learning
AI safety
Interpretability
Truthfulness in AI
gptkbp:studiedBy gptkb:DeepMind
gptkb:OpenAI
gptkb:Anthropic
gptkbp:bfsParent gptkb:Iterated_Amplification
gptkbp:bfsLayer 7