Statements (53)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:philosophy
|
| gptkbp:addressedTo |
AI alignment research
policy and governance technical solutions AI safety organizations |
| gptkbp:alsoKnownAs |
AI alignment problem
|
| gptkbp:category |
gptkb:existential_risk
gptkb:artificial_intelligence ethics of artificial intelligence |
| gptkbp:challenge |
corrigibility
value alignment instrumental convergence reward hacking preventing unintended consequences robustness to distributional shift specifying correct goals for AI |
| gptkbp:concerns |
controlling advanced artificial intelligence
|
| gptkbp:debatedBy |
policy makers
philosophers AI researchers technology ethicists |
| gptkbp:describedBy |
gptkb:Superintelligence:_Paths,_Dangers,_Strategies
|
| gptkbp:field |
AI safety
machine ethics |
| gptkbp:firstDiscussed |
20th century
|
| gptkbp:goal |
ensure AI acts in accordance with human values
|
| gptkbp:hasSubproblem |
scalable oversight
value alignment problem avoiding negative side effects avoiding reward tampering capability control problem corrigibility problem motivation selection problem reward specification problem robustness to adversarial inputs safe interruptibility |
| gptkbp:importantFor |
high for future of humanity
|
| gptkbp:notablePublication |
gptkb:Superintelligence:_Paths,_Dangers,_Strategies
Concrete Problems in AI Safety The Off-Switch Game |
| gptkbp:organization |
gptkb:DeepMind
gptkb:OpenAI gptkb:Future_of_Humanity_Institute gptkb:Machine_Intelligence_Research_Institute gptkb:Center_for_Human-Compatible_AI |
| gptkbp:relatedTo |
artificial general intelligence
existential risk from artificial intelligence |
| gptkbp:studiedBy |
gptkb:Nick_Bostrom
gptkb:Stuart_Russell gptkb:Eliezer_Yudkowsky |
| gptkbp:bfsParent |
gptkb:Agent_Foundations
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
AI control problem
|