Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017)
GPTKB entity
Statements (29)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:academic_journal
|
gptkbp:affiliation |
gptkb:OpenAI
|
gptkbp:author |
gptkb:Jan_Leike
gptkb:Tom_B._Brown gptkb:Shane_Legg gptkb:Dario_Amodei gptkb:Paul_F._Christiano Miljan Martic |
gptkbp:citation |
over 2000
AI alignment research |
gptkbp:demonstrates |
training agents with human feedback
outperforming hand-crafted reward functions in some tasks |
gptkbp:experimentDomain |
gptkb:Atari_games
robotics MuJoCo environments |
gptkbp:hasMethod |
learning reward functions from human preferences
|
https://www.w3.org/2000/01/rdf-schema#label |
Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017)
|
gptkbp:influenced |
Reinforcement Learning from Human Feedback (RLHF)
|
gptkbp:method |
policy optimization
pairwise preference comparisons reward model training |
gptkbp:publicationYear |
2017
|
gptkbp:publishedIn |
gptkb:NeurIPS
|
gptkbp:topic |
deep reinforcement learning
reward modeling human feedback |
gptkbp:url |
https://arxiv.org/abs/1706.03741
|
gptkbp:bfsParent |
gptkb:RLHF
|
gptkbp:bfsLayer |
7
|