Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017)

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:academic_journal
gptkbp:affiliation	gptkb:OpenAI
gptkbp:author	gptkb:Jan_Leike gptkb:Tom_B._Brown gptkb:Shane_Legg gptkb:Dario_Amodei gptkb:Paul_F._Christiano Miljan Martic
gptkbp:citation	over 2000 AI alignment research
gptkbp:demonstrates	training agents with human feedback outperforming hand-crafted reward functions in some tasks
gptkbp:experimentDomain	gptkb:Atari_games robotics MuJoCo environments
gptkbp:hasMethod	learning reward functions from human preferences
gptkbp:influenced	Reinforcement Learning from Human Feedback (RLHF)
gptkbp:method	policy optimization pairwise preference comparisons reward model training
gptkbp:publicationYear	2017
gptkbp:publishedIn	gptkb:NeurIPS
gptkbp:topic	deep reinforcement learning reward modeling human feedback
gptkbp:url	https://arxiv.org/abs/1706.03741
gptkbp:bfsParent	gptkb:RLHF
gptkbp:bfsLayer	8
http://www.w3.org/2000/01/rdf-schema#label	Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017)