DDPO

GPTKB entity

Statements (19)
Predicate Object
gptkbp:instanceOf gptkb:algorithm
gptkbp:advantage improves sample efficiency
reduces reward model overfitting
gptkbp:application large language models
gptkbp:contrastsWith gptkb:DPO
PPO
gptkbp:field gptkb:machine_learning
gptkb:reinforcement_learning
gptkbp:fullName Direct Preference Optimization with Policy Optimization
https://www.w3.org/2000/01/rdf-schema#label DDPO
gptkbp:introducedIn 2023
gptkbp:method directly optimizes policy using preference data
gptkbp:notablePublication Direct Preference Optimization: Your Language Model is Secretly a Reward Model
gptkbp:proposedBy gptkb:Microsoft_Research
gptkbp:purpose aligning language models with human preferences
gptkbp:relatedTo gptkb:RLHF
preference optimization
gptkbp:bfsParent gptkb:Defense_Dissemination_Program_Office
gptkbp:bfsLayer 8