PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Sentencepiece Python Packages

Python packages with the GitHub topic sentencepiece. Sorted by relevance, with stars and monthly downloads.
OpenNMT
pyonmttok

Fast and customizable text tokenization library with BPE and SentencePiece support

44K 333 83
Systemcluster
kitoken

Fast tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.

3K 49 2
himkt
tiny-tokenizer

No description available

520 261 25
ZJaume
escape-unk

Escape unknown symbols in SentecePiece vocabularies

270 0 0
Okramjimmy
meitei-senter

Neural sentence boundary detection for Meitei Mayek (Manipuri) using SentencePiece tokenization and a CNN-based spaCy pipeline.

226 0 0
stef41
toksight

Tokenizer analysis & comparison toolkit — compression, coverage, audit, cost estimation. Zero deps.

186 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery