WordPiece

GPTKB entity

Statements (22)
Predicate Object
gptkbp:instanceOf tokenization algorithm
gptkbp:advantage handles unknown words
reduces vocabulary size
gptkbp:developedBy gptkb:Google
https://www.w3.org/2000/01/rdf-schema#label WordPiece
gptkbp:input gptkb:text
gptkbp:introducedIn 2016
gptkbp:output tokens
gptkbp:purpose subword tokenization
gptkbp:relatedTo gptkb:Byte_Pair_Encoding
gptkb:SentencePiece
gptkbp:splitsWordsInto subword units
gptkbp:usedFor natural language processing
gptkbp:usedIn gptkb:BERT
gptkb:ALBERT
gptkb:DistilBERT
machine translation
speech recognition
question answering
text classification
gptkbp:bfsParent gptkb:large_language_model
gptkbp:bfsLayer 5