PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Text Mining Python Packages

Python packages with the GitHub topic text-mining. Sorted by relevance, with stars and monthly downloads.
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

8.9M 6K 371
deanmalmgren
textract

extract text from any document. no muss. no fuss.

408K 5K 678
csurfer
rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

287K 1K 150
bookieio
breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

95K 205 26
Lilykos
pyphonetics

A Python 3 phonetics library.

92K 139 21
KyleKing
textract-py3

Maintained fork of deanmalmgren/textract to replace '*' dependencies and other updates

57K 14 2
Lips7
matcher-py

A high-performance matcher designed to solve LOGICAL and TEXT VARIATIONS problems in word matching, implemented in Rust.

30K 18 1
aphp
edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

17K 165 41
JasonKessler
scattertext

Beautiful visualizations of how language differs among document types.

16K 2K 285
biolab
orange3-text

🍊 :page_facing_up: Text Mining add-on for Orange3

13K 134 86
averbis
averbis-python-api

Conveniently access the REST API of Averbis products using Python

7K 12 5
vmenger
deduce

Deduce: de-identification method for Dutch medical text

5K 65 27
huspacy
huspacy-nightly

HuSpaCy: industrial-strength Hungarian natural language processing

3K 185 18
huspacy
huspacy

HuSpaCy: industrial-strength Hungarian natural language processing

3K 185 18
mesejo
trrex

Efficient string matching with regular expressions

3K 146 7
PetrKorab
arabica

Python package for text mining of time-series data

3K 75 16
stephenhky
shorttext

Various Algorithms for Short Text Mining

2K 471 74
lasigeBioTM
bent

Biomedical Term Annotator

2K 9 1
rosette-api
rosette-api

Babel Street Analytics Client Library for Python

2K 38 37
jbesomi
texthero

Text preprocessing, representation and visualization from zero to hero.

2K 3K 237
vgrabovets
multi-rake

Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python

2K 272 36
mcs07
chemdataextractor

Automatically extract chemical information from scientific documents

1K 355 123
AbhishekSalian
random-word-generator

This library helps you to create random words i.e noise in text data. Helpful in many tasks like the generation of random authorization token generation of constant or variable length, text data augmentation, etc.

1K 4 1
sergioburdisso
pyss3

A Python library for Interpretable Machine Learning in Text Classification using the SS3 model, with easy-to-use visualization tools for Explainable AI :octocat:

1K 348 44
    • Data from PyPI, GitHub, ClickHouse, and BigQuery