Mechanistic Interpretability
GPTKB entity
Statements (45)
Predicate | Object |
---|---|
gptkbp:instanceOf |
Field of AI research
|
gptkbp:aimsTo |
Enable model auditing
Improve AI safety Increase transparency of AI systems |
gptkbp:appliesTo |
Deep learning models
|
gptkbp:emergedIn |
2020s
|
gptkbp:focusesOn |
Understanding neural network internals
|
gptkbp:goal |
Explain model behavior in terms of mechanisms
|
gptkbp:hasMethod |
Circuit analysis
Activation patching Feature visualization Neuron analysis |
https://www.w3.org/2000/01/rdf-schema#label |
Mechanistic Interpretability
|
gptkbp:notableContributor |
gptkb:Tom_Lieberum
gptkb:Chris_Olah gptkb:Neel_Nanda |
gptkbp:notableOrganization |
gptkb:DeepMind
gptkb:OpenAI gptkb:Anthropic gptkb:Redwood_Research |
gptkbp:notablePublication |
gptkb:A_Mathematical_Framework_for_Transformer_Circuits
gptkb:Transformer_Circuits Zoom In: An Introduction to Circuits |
gptkbp:relatedConcept |
gptkb:Transformer_models
Transparency Activation patching Attention heads Feature attribution Induction heads Language models Model auditing Model internals Model interpretability Model transparency Monosemanticity Neural network interpretability Neuron splitting Polysemantic neurons Reverse engineering neural networks Superposition in neural networks Vision models |
gptkbp:relatedTo |
gptkb:Explainable_AI
Interpretability |
gptkbp:bfsParent |
gptkb:Transformer_Circuits
|
gptkbp:bfsLayer |
8
|