Statements (24)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:data_compression_algorithm
|
| gptkbp:abbreviation |
gptkb:BPE
|
| gptkbp:appliesTo |
gptkb:text
binary data |
| gptkbp:category |
compression algorithms
tokenization algorithms |
| gptkbp:introduced |
gptkb:Philip_Gage
|
| gptkbp:introducedIn |
1994
|
| gptkbp:relatedTo |
gptkb:WordPiece
gptkb:Unigram_Language_Model subword tokenization |
| gptkbp:step |
find most frequent pair of bytes
repeat until no more pairs replace pair with unused byte |
| gptkbp:supportsAlgorithm |
lossless compression
|
| gptkbp:usedBy |
gptkb:OpenAI
gptkb:GPT-2 gptkb:GPT-3 |
| gptkbp:usedIn |
data compression
natural language processing tokenization |
| gptkbp:bfsParent |
gptkb:BPE
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Byte Pair Encoding
|