PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Vocab Python Packages

Python packages with the GitHub topic vocab. Sorted by relevance, with stars and monthly downloads.
Hk669
bpetokenizer

A Byte Pair Encoding (BPE) tokenizer, which algorithmically follows along the GPT tokenizer(tiktoken), allows you to train your own tokenizer. The tokenizer is capable of handling special tokens and uses a customizable regex pattern for tokenization(includes the gpt4 regex pattern). supports `save` and `load` tokenizers in the `json` and `file` format. The `bpetokenizer` also supports [pretrained](bpetokenizer/pretrained/) tokenizers.

680 3 1
mbforbes
textmetrics

Automatic text metrics (BLEU, ROUGE, METEOR, +++)

231 5 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery