Statements (15)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:concept
|
| gptkbp:canBe |
positive
zero negative |
| gptkbp:measures |
desirability of an action
|
| gptkbp:relatedTo |
reward function
value function policy optimization |
| gptkbp:represents |
feedback signal
|
| gptkbp:symbol |
R
|
| gptkbp:usedFor |
guide agent behavior
|
| gptkbp:usedIn |
gptkb:reinforcement_learning
|
| gptkbp:bfsParent |
gptkb:iterated_Prisoner's_Dilemma
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Reward (R)
|