MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

GPTKB entity

Statements (25)
Predicate Object
gptkbp:instanceOf gptkb:academic_journal
gptkbp:application natural language processing
language model compression
gptkbp:author gptkb:Jingdong_Wang
gptkb:Xiaodong_Liu
gptkb:Wei_Chen
gptkb:Jianfeng_Gao
gptkb:Yankai_Lin
Lijuan Wang
Yiwei Song
Yuyu Hu
gptkbp:citation high (hundreds to thousands)
gptkbp:focusesOn model compression
transformer models
self-attention distillation
https://www.w3.org/2000/01/rdf-schema#label MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
gptkbp:method deep distillation
self-attention distillation
gptkbp:proposedBy MiniLM model
gptkbp:publicationYear 2020
gptkbp:publishedIn gptkb:NeurIPS_2020
gptkbp:title gptkb:MiniLM:_Deep_Self-Attention_Distillation_for_Task-Agnostic_Compression_of_Pre-Trained_Transformers
gptkbp:url https://arxiv.org/abs/2002.10957
gptkbp:bfsParent gptkb:MiniLM
gptkbp:bfsLayer 6