gptkbp:instanceOf
|
large language model
|
gptkbp:author
|
gptkb:Pengcheng_He
gptkb:Weizhu_Chen
gptkb:Xiaodong_Liu
gptkb:Jianfeng_Gao
|
gptkbp:availableOn
|
gptkb:Hugging_Face_Model_Hub
|
gptkbp:basedOn
|
Transformer architecture
|
gptkbp:bench
|
gptkb:GLUE
gptkb:SQuAD
gptkb:SuperGLUE
|
gptkbp:citation
|
2021
|
gptkbp:developedBy
|
gptkb:Microsoft_Research
|
gptkbp:feature
|
disentangled attention mechanism
enhanced mask decoder
improved parameter efficiency
|
https://www.w3.org/2000/01/rdf-schema#label
|
DeBERTa-v3
|
gptkbp:improves
|
gptkb:BERT
gptkb:DeBERTa
gptkb:RoBERTa
|
gptkbp:language
|
English
|
gptkbp:license
|
gptkb:MIT_License
|
gptkbp:notablePublication
|
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient Disentangled Embedding Sharing
|
gptkbp:openSource
|
true
|
gptkbp:parameter
|
varies by model size
|
gptkbp:predecessor
|
gptkb:DeBERTa
|
gptkbp:pretrainingMethod
|
ELECTRA-style pre-training
|
gptkbp:releaseYear
|
2021
|
gptkbp:usedFor
|
natural language processing
question answering
text generation
text classification
named entity recognition
|
gptkbp:bfsParent
|
gptkb:DeBERTa
|
gptkbp:bfsLayer
|
6
|