Mechanistic Interpretability

GPTKB entity

Statements (45)
Predicate Object
gptkbp:instanceOf Field of AI research
gptkbp:aimsTo Enable model auditing
Improve AI safety
Increase transparency of AI systems
gptkbp:appliesTo Deep learning models
gptkbp:emergedIn 2020s
gptkbp:focusesOn Understanding neural network internals
gptkbp:goal Explain model behavior in terms of mechanisms
gptkbp:hasMethod Circuit analysis
Activation patching
Feature visualization
Neuron analysis
https://www.w3.org/2000/01/rdf-schema#label Mechanistic Interpretability
gptkbp:notableContributor gptkb:Tom_Lieberum
gptkb:Chris_Olah
gptkb:Neel_Nanda
gptkbp:notableOrganization gptkb:DeepMind
gptkb:OpenAI
gptkb:Anthropic
gptkb:Redwood_Research
gptkbp:notablePublication gptkb:A_Mathematical_Framework_for_Transformer_Circuits
gptkb:Transformer_Circuits
Zoom In: An Introduction to Circuits
gptkbp:relatedConcept gptkb:Transformer_models
Transparency
Activation patching
Attention heads
Feature attribution
Induction heads
Language models
Model auditing
Model internals
Model interpretability
Model transparency
Monosemanticity
Neural network interpretability
Neuron splitting
Polysemantic neurons
Reverse engineering neural networks
Superposition in neural networks
Vision models
gptkbp:relatedTo gptkb:Explainable_AI
Interpretability
gptkbp:bfsParent gptkb:Transformer_Circuits
gptkbp:bfsLayer 8