Statements (27)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:AI_alignment_technique
|
| gptkbp:benefit |
Improved AI transparency
Reduced risk of deceptive AI behavior |
| gptkbp:category |
gptkb:AI_safety_proposal
AI alignment research |
| gptkbp:challenge |
Reliance on human judgment
Scalability to complex questions Vulnerability to manipulation |
| gptkbp:describedBy |
AI safety literature
|
| gptkbp:goal |
Align AI behavior with human values
|
| gptkbp:inspiredBy |
Human debate
|
| gptkbp:method |
Two or more AIs debate a question
Human judges select the most convincing debater |
| gptkbp:notablePublication |
AI Safety via Debate (Irving et al., 2018)
|
| gptkbp:proposedBy |
gptkb:Paul_Christiano
gptkb:Geoffrey_Irving 2018 |
| gptkbp:relatedTo |
Machine learning
AI safety Interpretability Truthfulness in AI |
| gptkbp:studiedBy |
gptkb:DeepMind
gptkb:OpenAI gptkb:Anthropic |
| gptkbp:bfsParent |
gptkb:Iterated_Amplification
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Debate (AI alignment)
|