multi-armed bandit problem

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:mathematical_concept gptkb:reinforcement_learning_problem
gptkbp:application	gptkb:Advertising clinical trials recommendation systems
gptkbp:describes	trade-off between exploration and exploitation
gptkbp:field	gptkb:machine_learning decision theory statistics
gptkbp:firstDescribed	1933
gptkbp:hasVariant	adversarial bandit contextual bandit stochastic bandit
gptkbp:namedAfter	gptkb:casino
gptkbp:notableContributor	gptkb:Richard_Bellman gptkb:Herbert_Robbins
gptkbp:relatedConcept	gptkb:Markov_chain regret minimization exploration-exploitation dilemma
gptkbp:supportsAlgorithm	gptkb:UCB Thompson sampling epsilon-greedy
gptkbp:bfsParent	gptkb:Contextual_Bandits_with_Linear_Payoff_Functions
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	multi-armed bandit problem