Statements (28)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:AI_alignment_technique
|
| gptkbp:category |
Machine learning
AI alignment research AI safety research |
| gptkbp:describedBy |
AI safety via debate (OpenAI blog post, 2018)
|
| gptkbp:firstPublished |
2018
|
| gptkbp:goal |
Align advanced AI systems with human values
|
| gptkbp:influencedBy |
Adversarial training
Human-in-the-loop AI |
| gptkbp:limitation |
Debate may not reveal all relevant information
Scalability to superhuman AI is uncertain Human judges may be misled by persuasive but incorrect arguments |
| gptkbp:method |
Human judge decides which AI gave the most truthful or useful answer
Two or more AIs debate a question |
| gptkbp:proposedBy |
gptkb:OpenAI
gptkb:Paul_Christiano gptkb:Dario_Amodei gptkb:Geoffrey_Irving |
| gptkbp:relatedTo |
AI safety
AI alignment AI interpretability |
| gptkbp:studiedBy |
gptkb:DeepMind
gptkb:OpenAI gptkb:Alignment_Research_Center gptkb:Anthropic |
| gptkbp:bfsParent |
gptkb:Agent_Foundations
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
AI safety via debate
|