Statements (18)
Predicate | Object |
---|---|
gptkbp:instanceOf |
machine learning technique
|
gptkbp:alternativeName |
activation patching experiment
activation replacement |
gptkbp:appliesTo |
language models
transformer models |
gptkbp:enables |
identification of model components responsible for specific outputs
|
gptkbp:firstPublished |
2022
|
https://www.w3.org/2000/01/rdf-schema#label |
Activation Patching
|
gptkbp:introduced |
researchers at Anthropic
|
gptkbp:method |
replacing activations in a model with activations from a different input
|
gptkbp:notablePublication |
gptkb:A_Mathematical_Framework_for_Transformer_Circuits
|
gptkbp:purpose |
to identify which parts of a neural network are responsible for specific behaviors
|
gptkbp:relatedTo |
circuit analysis
feature attribution |
gptkbp:usedFor |
causal tracing in neural networks
|
gptkbp:usedIn |
mechanistic interpretability
|
gptkbp:bfsParent |
gptkb:Transformer_Circuits
|
gptkbp:bfsLayer |
8
|