Statements (25)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:mathematical_concept
gptkb:reinforcement_learning_problem |
| gptkbp:application |
gptkb:Advertising
clinical trials recommendation systems |
| gptkbp:describes |
trade-off between exploration and exploitation
|
| gptkbp:field |
gptkb:machine_learning
decision theory statistics |
| gptkbp:firstDescribed |
1933
|
| gptkbp:hasVariant |
adversarial bandit
contextual bandit stochastic bandit |
| gptkbp:namedAfter |
gptkb:casino
|
| gptkbp:notableContributor |
gptkb:Richard_Bellman
gptkb:Herbert_Robbins |
| gptkbp:relatedConcept |
gptkb:Markov_chain
regret minimization exploration-exploitation dilemma |
| gptkbp:supportsAlgorithm |
gptkb:UCB
Thompson sampling epsilon-greedy |
| gptkbp:bfsParent |
gptkb:Contextual_Bandits_with_Linear_Payoff_Functions
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
multi-armed bandit problem
|