SentencePiece

GPTKB entity

Statements (28)
Predicate Object
gptkbp:instanceOf gptkb:software
gptkbp:developedBy gptkb:Google
gptkbp:documentation https://github.com/google/sentencepiece
gptkbp:feature Unicode support
language independent
subword tokenization
trainable on raw text
gptkbp:firstReleased 2018
https://www.w3.org/2000/01/rdf-schema#label SentencePiece
gptkbp:license gptkb:Apache_License_2.0
gptkbp:npmPackage sentencepiece
gptkbp:platform cross-platform
gptkbp:programmingLanguage gptkb:Python
gptkb:C++
gptkbp:purpose text segmentation
unsupervised text tokenizer
gptkbp:repository https://github.com/google/sentencepiece
gptkbp:supportsAlgorithm gptkb:Byte-Pair_Encoding
gptkb:Unigram_Language_Model
gptkbp:usedIn machine translation
natural language processing
language modeling
gptkbp:bfsParent gptkb:WordPiece
gptkb:T5
gptkb:XLM-R
gptkb:Mixtral_8x7B
gptkb:Mixtral
gptkbp:bfsLayer 6