PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Word Segmentation Python Packages

Python packages with the GitHub topic word-segmentation. Sorted by relevance, with stars and monthly downloads.
google
sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

32M 12K 1K
PyThaiNLP
pythainlp

Thai natural language processing in Python

1.1M 1K 297
mammothb
symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

411K 870 126
bab2min
kiwipiepy

Python API for Kiwi

245K 381 33
taishi-i
nagisa

A Japanese tokenizer based on recurrent neural networks

226K 417 23
bab2min
kiwipiepy-model

Python API for Kiwi

141K 381 33
jidasheng
bi-lstm-crf

A PyTorch implementation of the BI-LSTM-CRF model.

32K 261 46
modelscope
adaseq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

30K 454 45
jacksonllee
pycantonese

Cantonese Linguistics and NLP

23K 405 42
jacksonllee
rustling

A high-performance library for computational linguistics

15K 2 0
vkcom
youtokentome

Unsupervised text tokenizer focused on computational efficiency

11K 978 108
JayYip
bert-multitask-learning

BERT for Multitask Learning

8K 544 123
jacksonllee
wordseg

Word segmentation models

8K 5 1
google
tf-sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

3K 12K 1K
Systemcluster
kitoken

Fast tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.

3K 49 2
cbaziotis
ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

2K 675 94
baidu
lac

百度NLP:分词,词性标注,命名实体识别,词重要性

2K 4K 590
dnanhkhoa
vncorenlp

A Python wrapper for VnCoreNLP using a bidirectional communication channel.

1K 58 18
hellonlp
hellonlp

NLP tools, word segmentation, sentence segmentation, New-Word-Discovery,新词发现

1K 27 9
messense
cjieba

Python cffi binding for cjieba

902 15 0
viig99
symspellcpppy

Fast SymSpell written in c++ and exposes to python via pybind11

773 44 9
fann1993814
hanlperceptron

Native Python HanLP Perceptron Model: HanLPerceptron 中文斷詞 詞性標註 命名實體識別

737 7 2
ruanchaves
hashformers

Accurate word segmentation for hashtags and text, powered by Transformers and Beam Search. A scalable alternative to heuristic splitters and massive LLMs.

449 77 6
taishi-i
toiro

A tool for comparing tokenizers

405 123 9
    • Data from PyPI, GitHub, ClickHouse, and BigQuery