Constitutional AI: Harmlessness from AI Feedback

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:academic_journal
gptkbp:author	gptkb:Scott_Johnston gptkb:Andy_Jones gptkb:Jan_Leike gptkb:John_Schulman gptkb:Yuntao_Bai gptkb:Dario_Amodei gptkb:Jakub_Pachocki gptkb:Jared_Kaplan gptkb:Nelson_Elhage gptkb:Rewon_Child gptkb:Tom_Henighan gptkb:Jeffrey_Wu gptkb:Andy_H._Zhou gptkb:Heidy_Khlaaf gptkb:Kamyar_Ghasemipour gptkb:Liane_Lovitt gptkb:Nicholas_Schiefer gptkb:Saurav_Kadavath gptkb:Sheer_El-Showk gptkb:Tom_Conerly
gptkbp:citation	100+
gptkbp:describes	training language models to be harmless using AI feedback and a set of principles (a constitution)
gptkbp:publicationDate	2022
gptkbp:publishedBy	gptkb:Anthropic
gptkbp:topic	gptkb:Constitutional_AI harmlessness AI alignment AI feedback
gptkbp:url	https://arxiv.org/abs/2212.08073
gptkbp:bfsParent	gptkb:Constitutional_AI
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	Constitutional AI: Harmlessness from AI Feedback