Contextual Bandits with Linear Payoff Functions
GPTKB entity
Statements (24)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:model
|
gptkbp:application |
Advertising
personalized recommendation |
gptkbp:challenge |
partial feedback
|
gptkbp:contextualInformation |
used
|
gptkbp:example |
gptkb:LinTS
gptkb:LinUCB |
gptkbp:explorationExploitationTradeoff |
present
|
gptkbp:field |
gptkb:reinforcement_learning
online learning |
gptkbp:goal |
maximize cumulative reward
|
https://www.w3.org/2000/01/rdf-schema#label |
Contextual Bandits with Linear Payoff Functions
|
gptkbp:input |
context vector
|
gptkbp:notablePublication |
gptkb:A_Contextual-Bandit_Approach_to_Personalized_News_Article_Recommendation_(Li_et_al.,_2010)
gptkb:Thompson_Sampling_for_Contextual_Bandits_with_Linear_Payoff_Functions_(Agrawal_&_Goyal,_2013) |
gptkbp:output |
action selection
|
gptkbp:parameter |
online regression
|
gptkbp:payoffFunction |
linear
|
gptkbp:proposedBy |
2000s
|
gptkbp:regretBound |
sublinear
|
gptkbp:relatedTo |
gptkb:multi-armed_bandit_problem
|
gptkbp:rewardModel |
linear function of context
|
gptkbp:bfsParent |
gptkb:Lihong_Li
|
gptkbp:bfsLayer |
6
|