Tokenizers library

GPTKB entity

Statements (30)
Predicate Object
gptkbp:instanceOf gptkb:software
gptkbp:availableOn gptkb:GitHub
gptkbp:developedBy gptkb:Hugging_Face
gptkbp:feature decoding
Unicode support
normalization
customizable pipelines
serialization
export to JSON
fast tokenization
integration with Python via bindings
multi-threaded processing
pre-tokenization
training new tokenizers
https://www.w3.org/2000/01/rdf-schema#label Tokenizers library
gptkbp:integratesWith gptkb:Transformers_library
gptkbp:license gptkb:Apache_License_2.0
gptkbp:npmPackage tokenizers
gptkbp:openSource true
gptkbp:programmingLanguage gptkb:Python
gptkb:Rust
gptkbp:purpose text tokenization
gptkbp:supports gptkb:WordPiece
gptkb:Byte-Pair_Encoding
gptkb:SentencePiece
Unigram
gptkbp:usedFor natural language processing
gptkbp:bfsParent gptkb:Hugging_Face
gptkb:Hugs
gptkbp:bfsLayer 6