Statements (28)
Predicate | Object |
---|---|
gptkbp:instanceOf |
AI alignment technique
|
gptkbp:category |
Machine learning
AI alignment research AI safety research |
gptkbp:describedBy |
AI safety via debate (OpenAI blog post, 2018)
|
gptkbp:firstPublished |
2018
|
gptkbp:goal |
Align advanced AI systems with human values
|
https://www.w3.org/2000/01/rdf-schema#label |
AI safety via debate
|
gptkbp:influencedBy |
Adversarial training
Human-in-the-loop AI |
gptkbp:limitation |
Debate may not reveal all relevant information
Scalability to superhuman AI is uncertain Human judges may be misled by persuasive but incorrect arguments |
gptkbp:method |
Human judge decides which AI gave the most truthful or useful answer
Two or more AIs debate a question |
gptkbp:proposedBy |
gptkb:OpenAI
gptkb:Paul_Christiano gptkb:Dario_Amodei gptkb:Geoffrey_Irving |
gptkbp:relatedTo |
AI safety
AI alignment AI interpretability |
gptkbp:studiedBy |
gptkb:DeepMind
gptkb:OpenAI gptkb:Alignment_Research_Center gptkb:Anthropic |
gptkbp:bfsParent |
gptkb:Agent_Foundations
|
gptkbp:bfsLayer |
6
|