Byte Pair Encoding

GPTKB entity

Statements (24)
Predicate Object
gptkbp:instanceOf data compression algorithm
gptkbp:abbreviation gptkb:BPE
gptkbp:appliesTo gptkb:text
binary data
gptkbp:category compression algorithms
tokenization algorithms
https://www.w3.org/2000/01/rdf-schema#label Byte Pair Encoding
gptkbp:introduced gptkb:Philip_Gage
gptkbp:introducedIn 1994
gptkbp:relatedTo gptkb:WordPiece
gptkb:Unigram_Language_Model
subword tokenization
gptkbp:step find most frequent pair of bytes
repeat until no more pairs
replace pair with unused byte
gptkbp:supportsAlgorithm lossless compression
gptkbp:usedBy gptkb:OpenAI
gptkb:GPT-2
gptkb:GPT-3
gptkbp:usedIn data compression
natural language processing
tokenization
gptkbp:bfsParent gptkb:WordPiece
gptkbp:bfsLayer 6