MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:academic_journal
gptkbp:application	natural language processing language model compression
gptkbp:author	gptkb:Jingdong_Wang gptkb:Xiaodong_Liu gptkb:Wei_Chen gptkb:Jianfeng_Gao gptkb:Yankai_Lin Lijuan Wang Yiwei Song Yuyu Hu
gptkbp:citation	high (hundreds to thousands)
gptkbp:focusesOn	model compression transformer models self-attention distillation
gptkbp:method	deep distillation self-attention distillation
gptkbp:proposedBy	MiniLM model
gptkbp:publicationYear	2020
gptkbp:publishedIn	gptkb:NeurIPS_2020
gptkbp:title	gptkb:MiniLM:_Deep_Self-Attention_Distillation_for_Task-Agnostic_Compression_of_Pre-Trained_Transformers
gptkbp:url	https://arxiv.org/abs/2002.10957
gptkbp:bfsParent	gptkb:MiniLM
gptkbp:bfsLayer	8
http://www.w3.org/2000/01/rdf-schema#label	MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers