Statements (53)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:philosophy
|
gptkbp:addressedTo |
AI alignment research
policy and governance technical solutions AI safety organizations |
gptkbp:alsoKnownAs |
AI alignment problem
|
gptkbp:category |
gptkb:artificial_intelligence
existential risk ethics of artificial intelligence |
gptkbp:challenge |
corrigibility
value alignment instrumental convergence reward hacking preventing unintended consequences robustness to distributional shift specifying correct goals for AI |
gptkbp:concerns |
controlling advanced artificial intelligence
|
gptkbp:debatedBy |
policy makers
philosophers AI researchers technology ethicists |
gptkbp:describedBy |
gptkb:Superintelligence:_Paths,_Dangers,_Strategies
|
gptkbp:field |
AI safety
machine ethics |
gptkbp:firstDiscussed |
20th century
|
gptkbp:goal |
ensure AI acts in accordance with human values
|
gptkbp:hasSubproblem |
scalable oversight
value alignment problem avoiding negative side effects avoiding reward tampering capability control problem corrigibility problem motivation selection problem reward specification problem robustness to adversarial inputs safe interruptibility |
https://www.w3.org/2000/01/rdf-schema#label |
AI control problem
|
gptkbp:importantFor |
high for future of humanity
|
gptkbp:notablePublication |
gptkb:Superintelligence:_Paths,_Dangers,_Strategies
Concrete Problems in AI Safety The Off-Switch Game |
gptkbp:organization |
gptkb:DeepMind
gptkb:OpenAI gptkb:Future_of_Humanity_Institute gptkb:Machine_Intelligence_Research_Institute gptkb:Center_for_Human-Compatible_AI |
gptkbp:relatedTo |
artificial general intelligence
existential risk from artificial intelligence |
gptkbp:studiedBy |
gptkb:Nick_Bostrom
gptkb:Stuart_Russell gptkb:Eliezer_Yudkowsky |
gptkbp:bfsParent |
gptkb:Agent_Foundations
|
gptkbp:bfsLayer |
6
|