Statements (23)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:algorithm
|
gptkbp:category |
data compression algorithm
tokenization algorithm |
gptkbp:fullName |
gptkb:Byte_Pair_Encoding
|
https://www.w3.org/2000/01/rdf-schema#label |
BPE
|
gptkbp:introduced |
gptkb:Philip_Gage
|
gptkbp:introducedIn |
1994
|
gptkbp:output |
subword vocabulary
|
gptkbp:relatedTo |
gptkb:WordPiece
gptkb:Unigram_Language_Model gptkb:SentencePiece |
gptkbp:step |
merge most frequent pair of bytes
repeat until vocabulary size reached |
gptkbp:usedBy |
gptkb:OpenAI
gptkb:GPT-2 gptkb:GPT-3 gptkb:Hugging_Face_tokenizers |
gptkbp:usedFor |
text segmentation
subword tokenization |
gptkbp:usedIn |
data compression
natural language processing |
gptkbp:bfsParent |
gptkb:Qinhuangdao_Beidaihe_Airport
|
gptkbp:bfsLayer |
6
|