AI safety via debate

GPTKB entity

Statements (28)
Predicate Object
gptkbp:instanceOf AI alignment technique
gptkbp:category Machine learning
AI alignment research
AI safety research
gptkbp:describedBy AI safety via debate (OpenAI blog post, 2018)
gptkbp:firstPublished 2018
gptkbp:goal Align advanced AI systems with human values
https://www.w3.org/2000/01/rdf-schema#label AI safety via debate
gptkbp:influencedBy Adversarial training
Human-in-the-loop AI
gptkbp:limitation Debate may not reveal all relevant information
Scalability to superhuman AI is uncertain
Human judges may be misled by persuasive but incorrect arguments
gptkbp:method Human judge decides which AI gave the most truthful or useful answer
Two or more AIs debate a question
gptkbp:proposedBy gptkb:OpenAI
gptkb:Paul_Christiano
gptkb:Dario_Amodei
gptkb:Geoffrey_Irving
gptkbp:relatedTo AI safety
AI alignment
AI interpretability
gptkbp:studiedBy gptkb:DeepMind
gptkb:OpenAI
gptkb:Alignment_Research_Center
gptkb:Anthropic
gptkbp:bfsParent gptkb:Agent_Foundations
gptkbp:bfsLayer 6