Byte Pair Encoding

GPTKB entity

Statements (24)
Predicate Object
gptkbp:instanceOf gptkb:data_compression_algorithm
gptkbp:abbreviation gptkb:BPE
gptkbp:appliesTo gptkb:text
binary data
gptkbp:category compression algorithms
tokenization algorithms
gptkbp:introduced gptkb:Philip_Gage
gptkbp:introducedIn 1994
gptkbp:relatedTo gptkb:WordPiece
gptkb:Unigram_Language_Model
subword tokenization
gptkbp:step find most frequent pair of bytes
repeat until no more pairs
replace pair with unused byte
gptkbp:supportsAlgorithm lossless compression
gptkbp:usedBy gptkb:OpenAI
gptkb:GPT-2
gptkb:GPT-3
gptkbp:usedIn data compression
natural language processing
tokenization
gptkbp:bfsParent gptkb:BPE
gptkbp:bfsLayer 7
https://www.w3.org/2000/01/rdf-schema#label Byte Pair Encoding