Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017)
GPTKB entity
Statements (29)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:academic_journal
|
| gptkbp:affiliation |
gptkb:OpenAI
|
| gptkbp:author |
gptkb:Jan_Leike
gptkb:Tom_B._Brown gptkb:Shane_Legg gptkb:Dario_Amodei gptkb:Paul_F._Christiano Miljan Martic |
| gptkbp:citation |
over 2000
AI alignment research |
| gptkbp:demonstrates |
training agents with human feedback
outperforming hand-crafted reward functions in some tasks |
| gptkbp:experimentDomain |
gptkb:Atari_games
robotics MuJoCo environments |
| gptkbp:hasMethod |
learning reward functions from human preferences
|
| gptkbp:influenced |
Reinforcement Learning from Human Feedback (RLHF)
|
| gptkbp:method |
policy optimization
pairwise preference comparisons reward model training |
| gptkbp:publicationYear |
2017
|
| gptkbp:publishedIn |
gptkb:NeurIPS
|
| gptkbp:topic |
deep reinforcement learning
reward modeling human feedback |
| gptkbp:url |
https://arxiv.org/abs/1706.03741
|
| gptkbp:bfsParent |
gptkb:RLHF
|
| gptkbp:bfsLayer |
8
|
| https://www.w3.org/2000/01/rdf-schema#label |
Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017)
|