MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
GPTKB entity
Statements (25)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:academic_journal
|
| gptkbp:application |
natural language processing
language model compression |
| gptkbp:author |
gptkb:Jingdong_Wang
gptkb:Xiaodong_Liu gptkb:Wei_Chen gptkb:Jianfeng_Gao gptkb:Yankai_Lin Lijuan Wang Yiwei Song Yuyu Hu |
| gptkbp:citation |
high (hundreds to thousands)
|
| gptkbp:focusesOn |
model compression
transformer models self-attention distillation |
| gptkbp:method |
deep distillation
self-attention distillation |
| gptkbp:proposedBy |
MiniLM model
|
| gptkbp:publicationYear |
2020
|
| gptkbp:publishedIn |
gptkb:NeurIPS_2020
|
| gptkbp:title |
gptkb:MiniLM:_Deep_Self-Attention_Distillation_for_Task-Agnostic_Compression_of_Pre-Trained_Transformers
|
| gptkbp:url |
https://arxiv.org/abs/2002.10957
|
| gptkbp:bfsParent |
gptkb:MiniLM
|
| gptkbp:bfsLayer |
8
|
| https://www.w3.org/2000/01/rdf-schema#label |
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
|