gptkbp:instanceOf
|
machine learning technique
|
gptkbp:alternativeTo
|
supervised learning
pure reinforcement learning
|
gptkbp:application
|
gptkb:dialogue_systems
content moderation
instruction following
safety alignment
|
gptkbp:appliesTo
|
gptkb:GPT-3
gptkb:GPT-4
gptkb:Bard
gptkb:Claude
|
gptkbp:category
|
gptkb:artificial_intelligence
gptkb:machine_learning
gptkb:reinforcement_learning
|
gptkbp:challenge
|
reward hacking
feedback quality
scalability of human feedback
|
gptkbp:component
|
gptkb:reinforcement_learning
policy optimization
human feedback
reward model
|
gptkbp:firstPublished
|
2017
|
gptkbp:fullName
|
gptkb:Reinforcement_Learning_from_Human_Feedback
|
https://www.w3.org/2000/01/rdf-schema#label
|
RLHF
|
gptkbp:notablePublication
|
gptkb:Deep_Reinforcement_Learning_from_Human_Preferences_(Christiano_et_al.,_2017)
|
gptkbp:purpose
|
align AI behavior with human preferences
|
gptkbp:relatedTo
|
AI alignment
preference modeling
supervised fine-tuning
|
gptkbp:step
|
collect human feedback
optimize policy with RL
train reward model
|
gptkbp:usedBy
|
gptkb:OpenAI
gptkb:Anthropic
gptkb:Google_DeepMind
|
gptkbp:usedIn
|
large language models
|
gptkbp:bfsParent
|
gptkb:Constitutional_AI
gptkb:Reinforcement_Learning_from_Human_Feedback
|
gptkbp:bfsLayer
|
6
|