AI safety via debate

URI: https://gptkb.org/entity/AI_safety_via_debate

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:AI_alignment_technique
gptkbp:category	Machine learning AI alignment research AI safety research
gptkbp:describedBy	AI safety via debate (OpenAI blog post, 2018)
gptkbp:firstPublished	2018
gptkbp:goal	Align advanced AI systems with human values
gptkbp:influencedBy	Adversarial training Human-in-the-loop AI
gptkbp:limitation	Debate may not reveal all relevant information Scalability to superhuman AI is uncertain Human judges may be misled by persuasive but incorrect arguments
gptkbp:method	Human judge decides which AI gave the most truthful or useful answer Two or more AIs debate a question
gptkbp:proposedBy	gptkb:OpenAI gptkb:Paul_Christiano gptkb:Dario_Amodei gptkb:Geoffrey_Irving
gptkbp:relatedTo	AI safety AI alignment AI interpretability
gptkbp:studiedBy	gptkb:DeepMind gptkb:OpenAI gptkb:Alignment_Research_Center gptkb:Anthropic
gptkbp:bfsParent	gptkb:Agent_Foundations
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	AI safety via debate