PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Text Processing Python Packages

Python packages with the GitHub topic text-processing. Sorted by relevance, with stars and monthly downloads.
pyparsing
pyparsing

Python library for creating PEG parsers

374.1M 2K 310
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

79.5M 10K 726
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.8M 10K 726
derek73
nameparser

A simple Python module for parsing human names into their individual components

3.2M 707 105
ikegami-yukino
jaconv

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

3.1M 347 33
PyThaiNLP
pythainlp

Thai natural language processing in Python

1.1M 1K 297
kreuzberg-dev
html-to-markdown

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR.

494K 710 57
Ailln
proces

🐨 text preprocess.

253K 5 0
thombashi
humanreadable

humanreadable is a Python library to convert human-readable values to other units.

195K 21 1
wenet-e2e
wetextprocessing

Text Normalization & Inverse Text Normalization

101K 765 105
Lips7
matcher-py

A high-performance matcher designed to solve LOGICAL and TEXT VARIATIONS problems in word matching, implemented in Rust.

30K 18 1
swen128
twitter-text-parser

Twitter Text Libraries for Python

19K 29 3
jacksonllee
rustling

A high-performance library for computational linguistics

15K 2 0
daac-tools
daachorse

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

14K 21 1
PyThaiNLP
nlpo3

Thai natural language processing library in Rust, with Python and Node bindings.

12K 46 13
voidful
tfkit

🤖📇 handling multiple nlp task in one pipeline

12K 57 6
roshan-research
hazm

Persian NLP Toolkit

11K 1K 206
shner-elmo
flashtext2

Flashtext implementation in Rust

9K 11 1
ttarvis
hexlock

Format-preserving redaction for PII and sensitive data that works with LLMs/text-based pipelines

6K 6 0
proycon
pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

6K 476 66
vmenger
deduce

Deduce: de-identification method for Dutch medical text

5K 65 27
casics
nostril-detector

Nostril: Nonsense String Evaluator

5K 199 34
ChenghaoMou
text-dedup

All-in-one text de-duplication

3K 759 77
MycroftAI
lingua-franca

Mycroft's multilingual text parsing and formatting library

3K 78 79
    • Data from PyPI, GitHub, ClickHouse, and BigQuery