Statements (25)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:mathematical_concept
reinforcement learning problem |
gptkbp:application |
Advertising
clinical trials recommendation systems |
gptkbp:describes |
trade-off between exploration and exploitation
|
gptkbp:field |
gptkb:machine_learning
decision theory statistics |
gptkbp:firstDescribed |
1933
|
gptkbp:hasVariant |
adversarial bandit
contextual bandit stochastic bandit |
https://www.w3.org/2000/01/rdf-schema#label |
multi-armed bandit problem
|
gptkbp:namedAfter |
gptkb:casino
|
gptkbp:notableContributor |
gptkb:Richard_Bellman
gptkb:Herbert_Robbins |
gptkbp:relatedConcept |
Markov chain
regret minimization exploration-exploitation dilemma |
gptkbp:supportsAlgorithm |
gptkb:UCB
Thompson sampling epsilon-greedy |
gptkbp:bfsParent |
gptkb:Contextual_Bandits_with_Linear_Payoff_Functions
|
gptkbp:bfsLayer |
7
|