Debate (AI alignment)

URI: https://gptkb.org/entity/Debate_(AI_alignment)

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:AI_alignment_technique
gptkbp:benefit	Improved AI transparency Reduced risk of deceptive AI behavior
gptkbp:category	gptkb:AI_safety_proposal AI alignment research
gptkbp:challenge	Reliance on human judgment Scalability to complex questions Vulnerability to manipulation
gptkbp:describedBy	AI safety literature
gptkbp:goal	Align AI behavior with human values
gptkbp:inspiredBy	Human debate
gptkbp:method	Two or more AIs debate a question Human judges select the most convincing debater
gptkbp:notablePublication	AI Safety via Debate (Irving et al., 2018)
gptkbp:proposedBy	gptkb:Paul_Christiano gptkb:Geoffrey_Irving 2018
gptkbp:relatedTo	Machine learning AI safety Interpretability Truthfulness in AI
gptkbp:studiedBy	gptkb:DeepMind gptkb:OpenAI gptkb:Anthropic
gptkbp:bfsParent	gptkb:Iterated_Amplification
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	Debate (AI alignment)