gptkbp:instanceOf
|
gptkb:academic
|
gptkbp:address
|
interpretability
corrigibility
instrumental convergence
value alignment problem
reward hacking
goal specification problem
|
gptkbp:concerns
|
ethical behavior of AI
robustness of AI systems
safety of AI systems
|
gptkbp:conference
|
gptkb:AI_Alignment_Forum
gptkb:AI_Safety_Camp
|
gptkbp:focusesOn
|
aligning AI systems with human values
|
gptkbp:hasConcept
|
gptkb:AI_control_problem
gptkb:AI_safety_via_debate
gptkb:Goodhart's_law
gptkb:ELK_(Eliciting_Latent_Knowledge)
gptkb:cooperative_inverse_reinforcement_learning
gptkb:iterated_distillation_and_amplification
gptkb:off-switch_problem
gptkb:tripwire_mechanism
AI ethics
AI alignment research
AI safety research
AI governance
amplification
inverse reinforcement learning
interpretability
AI risk
corrigibility
AI safety community
AI transparency
instrumental convergence
scalable oversight
AI boxing
AI interpretability
AI robustness
friendly AI
AI corrigibility
impact regularization
inner alignment
mesa-optimization
outer alignment
reward hacking
reward modeling
value learning
AI existential risk
AI alignment problem
AI alignment community
AI alignment forum
AI reliability
AI safety forum
AI safety problem
AI value alignment
deontic constraints
oracle AI
recursive reward modeling
|
https://www.w3.org/2000/01/rdf-schema#label
|
AI Alignment
|
gptkbp:importantFor
|
artificial general intelligence
advanced AI systems
|
gptkbp:notableContributor
|
gptkb:Ben_Garfinkel
gptkb:Nick_Bostrom
gptkb:Paul_Christiano
gptkb:Rohin_Shah
gptkb:Jessica_Taylor
gptkb:Jan_Leike
gptkb:Stuart_Russell
gptkb:Chris_Olah
gptkb:Dario_Amodei
gptkb:Andrew_Critch
gptkb:Eliezer_Yudkowsky
gptkb:Nate_Soares
gptkb:Owain_Evans
gptkb:Beth_Barnes
gptkb:Evan_Hubinger
gptkb:Richard_Ngo
gptkb:Victoria_Krakovna
|
gptkbp:organization
|
gptkb:DeepMind
gptkb:OpenAI
gptkb:Future_of_Humanity_Institute
gptkb:Machine_Intelligence_Research_Institute
gptkb:Alignment_Research_Center
gptkb:Anthropic
gptkb:Centre_for_the_Study_of_Existential_Risk
|
gptkbp:publishedIn
|
gptkb:LessWrong
gptkb:AI_Alignment_Newsletter
gptkb:Alignment_Forum
|
gptkbp:relatedTo
|
gptkb:artificial_intelligence
AI safety
machine ethics
|
gptkbp:studiedBy
|
researchers in computer science
researchers in cognitive science
researchers in philosophy
|
gptkbp:bfsParent
|
gptkb:LessWrong
|
gptkbp:bfsLayer
|
6
|