Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017)

GPTKB entity

Statements (29)
Predicate Object
gptkbp:instanceOf gptkb:academic_journal
gptkbp:affiliation gptkb:OpenAI
gptkbp:author gptkb:Jan_Leike
gptkb:Tom_B._Brown
gptkb:Shane_Legg
gptkb:Dario_Amodei
gptkb:Paul_F._Christiano
Miljan Martic
gptkbp:citation over 2000
AI alignment research
gptkbp:demonstrates training agents with human feedback
outperforming hand-crafted reward functions in some tasks
gptkbp:experimentDomain gptkb:Atari_games
robotics
MuJoCo environments
gptkbp:hasMethod learning reward functions from human preferences
https://www.w3.org/2000/01/rdf-schema#label Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017)
gptkbp:influenced Reinforcement Learning from Human Feedback (RLHF)
gptkbp:method policy optimization
pairwise preference comparisons
reward model training
gptkbp:publicationYear 2017
gptkbp:publishedIn gptkb:NeurIPS
gptkbp:topic deep reinforcement learning
reward modeling
human feedback
gptkbp:url https://arxiv.org/abs/1706.03741
gptkbp:bfsParent gptkb:RLHF
gptkbp:bfsLayer 7