DDPO

URI: https://gptkb.org/entity/DDPO

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:algorithm
gptkbp:advantage	improves sample efficiency reduces reward model overfitting
gptkbp:application	large language models
gptkbp:contrastsWith	gptkb:DPO PPO
gptkbp:field	gptkb:machine_learning gptkb:reinforcement_learning
gptkbp:fullName	Direct Preference Optimization with Policy Optimization
gptkbp:introducedIn	2023
gptkbp:method	directly optimizes policy using preference data
gptkbp:notablePublication	Direct Preference Optimization: Your Language Model is Secretly a Reward Model
gptkbp:proposedBy	gptkb:Microsoft_Research
gptkbp:purpose	aligning language models with human preferences
gptkbp:relatedTo	gptkb:RLHF preference optimization
gptkbp:bfsParent	gptkb:Defense_Dissemination_Program_Office
gptkbp:bfsLayer	8
https://www.w3.org/2000/01/rdf-schema#label	DDPO