Statements (23)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:algorithm
contextual bandit algorithm |
gptkbp:assumes |
linear reward model
|
gptkbp:basedOn |
upper confidence bound principle
|
gptkbp:category |
gptkb:machine_learning
gptkb:reinforcement_learning |
https://www.w3.org/2000/01/rdf-schema#label |
LinUCB
|
gptkbp:input |
context vector
|
gptkbp:introduced |
gptkb:John_Langford
gptkb:Robert_E._Schapire gptkb:Lihong_Li Wei Chu |
gptkbp:introducedIn |
2010
|
gptkbp:output |
action selection
|
gptkbp:publishedIn |
Proceedings of the 19th International Conference on World Wide Web
|
gptkbp:relatedTo |
Epsilon-Greedy
Thompson Sampling UCB1 |
gptkbp:usedFor |
gptkb:multi-armed_bandit_problem
online learning recommendation systems |
gptkbp:bfsParent |
gptkb:Contextual_Bandits_with_Linear_Payoff_Functions
|
gptkbp:bfsLayer |
7
|