MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
GPTKB entity
Statements (25)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:academic_journal
|
gptkbp:application |
natural language processing
language model compression |
gptkbp:author |
gptkb:Jingdong_Wang
gptkb:Xiaodong_Liu gptkb:Wei_Chen gptkb:Jianfeng_Gao gptkb:Yankai_Lin Lijuan Wang Yiwei Song Yuyu Hu |
gptkbp:citation |
high (hundreds to thousands)
|
gptkbp:focusesOn |
model compression
transformer models self-attention distillation |
https://www.w3.org/2000/01/rdf-schema#label |
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
|
gptkbp:method |
deep distillation
self-attention distillation |
gptkbp:proposedBy |
MiniLM model
|
gptkbp:publicationYear |
2020
|
gptkbp:publishedIn |
gptkb:NeurIPS_2020
|
gptkbp:title |
gptkb:MiniLM:_Deep_Self-Attention_Distillation_for_Task-Agnostic_Compression_of_Pre-Trained_Transformers
|
gptkbp:url |
https://arxiv.org/abs/2002.10957
|
gptkbp:bfsParent |
gptkb:MiniLM
|
gptkbp:bfsLayer |
6
|