Statements (23)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:algorithm
|
| gptkbp:category |
gptkb:data_compression_algorithm
gptkb:tokenization_algorithm |
| gptkbp:fullName |
gptkb:Byte_Pair_Encoding
|
| gptkbp:introduced |
gptkb:Philip_Gage
|
| gptkbp:introducedIn |
1994
|
| gptkbp:output |
subword vocabulary
|
| gptkbp:relatedTo |
gptkb:WordPiece
gptkb:Unigram_Language_Model gptkb:SentencePiece |
| gptkbp:step |
merge most frequent pair of bytes
repeat until vocabulary size reached |
| gptkbp:usedBy |
gptkb:OpenAI
gptkb:GPT-2 gptkb:GPT-3 gptkb:Hugging_Face_tokenizers |
| gptkbp:usedFor |
text segmentation
subword tokenization |
| gptkbp:usedIn |
data compression
natural language processing |
| gptkbp:bfsParent |
gptkb:Qinhuangdao_Beidaihe_Airport
|
| gptkbp:bfsLayer |
6
|
| https://www.w3.org/2000/01/rdf-schema#label |
BPE
|