RLHF

URI: https://gptkb.org/entity/RLHF

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:machine_learning_technique
gptkbp:alternativeTo	supervised learning pure reinforcement learning
gptkbp:application	gptkb:dialogue_systems content moderation instruction following safety alignment
gptkbp:appliesTo	gptkb:GPT-3 gptkb:GPT-4 gptkb:Bard gptkb:Claude
gptkbp:category	gptkb:artificial_intelligence gptkb:machine_learning gptkb:reinforcement_learning
gptkbp:challenge	reward hacking feedback quality scalability of human feedback
gptkbp:component	gptkb:reinforcement_learning policy optimization human feedback reward model
gptkbp:firstPublished	2017
gptkbp:fullName	gptkb:Reinforcement_Learning_from_Human_Feedback
gptkbp:notablePublication	gptkb:Deep_Reinforcement_Learning_from_Human_Preferences_(Christiano_et_al.,_2017)
gptkbp:purpose	align AI behavior with human preferences
gptkbp:relatedTo	AI alignment preference modeling supervised fine-tuning
gptkbp:step	collect human feedback optimize policy with RL train reward model
gptkbp:usedBy	gptkb:OpenAI gptkb:Anthropic gptkb:Google_DeepMind
gptkbp:usedIn	large language models
gptkbp:bfsParent	gptkb:HF_Dawn gptkb:Constitutional_AI
gptkbp:bfsLayer	7
https://www.w3.org/2000/01/rdf-schema#label	RLHF