Statements (18)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:machine_learning_technique
|
| gptkbp:alternativeName |
activation patching experiment
activation replacement |
| gptkbp:appliesTo |
language models
transformer models |
| gptkbp:enables |
identification of model components responsible for specific outputs
|
| gptkbp:firstPublished |
2022
|
| gptkbp:introduced |
researchers at Anthropic
|
| gptkbp:method |
replacing activations in a model with activations from a different input
|
| gptkbp:notablePublication |
gptkb:A_Mathematical_Framework_for_Transformer_Circuits
|
| gptkbp:purpose |
to identify which parts of a neural network are responsible for specific behaviors
|
| gptkbp:relatedTo |
circuit analysis
feature attribution |
| gptkbp:usedFor |
causal tracing in neural networks
|
| gptkbp:usedIn |
mechanistic interpretability
|
| gptkbp:bfsParent |
gptkb:Transformer_Circuits
|
| gptkbp:bfsLayer |
8
|
| https://www.w3.org/2000/01/rdf-schema#label |
Activation Patching
|