PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Nlp Python Packages

Python packages with the GitHub topic nlp. Sorted by relevance, with stars and monthly downloads.
huggingface
tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

175.3M 11K 1K
huggingface
transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

149.1M 161K 33K
huggingface
datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

127.8M 22K 3K
nltk
nltk

NLTK Source

63.2M 15K 3K
explosion
thinc

🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

24.4M 3K 292
explosion
spacy

💫 Industrial-strength Natural Language Processing (NLP) in Python

21.6M 34K 5K
explosion
spacy-loggers

📟 Logging utilities for spaCy

18.2M 12 17
adbar
htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

11.2M 149 30
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

8.9M 6K 371
datamade
usaddress

:us: a python library for parsing unstructured United States address strings into address components

5.7M 2K 308
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.4M 15K 1K
RaRe-Technologies
gensim

Topic Modelling for Humans

5M 16K 4K
Microsoft
presidio-analyzer

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

4.9M 8K 1K
masci
banks

LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. It allows attaching metadata to prompts to ease their management, and versioning is first-class citizen. Banks provides ways to store prompts on disk along with their metadata.

4.7M 127 20
modelscope
modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

4.5M 9K 943
akoumjian
datefinder

Find dates inside text using Python and get back datetime objects

4M 663 170
isaacus-dev
semchunk

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

4M 628 40
Microsoft
presidio-anonymizer

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

3.5M 8K 1K
hplt-project
sacremoses

Python port of Moses tokenizer, truecaser and normalizer

2.7M 495 59
sloria
textblob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

2.6M 10K 1K
vi3k6i5
flashtext

Extract Keywords from sentence or Replace keywords in sentences.

2.3M 6K 597
pemistahl
lingua-language-detector

The most accurate natural language detection library for Python, suitable for short text and mixed-language text

1.7M 2K 60
akshaynagpal
word2number

Convert number words (eg. twenty one) to numeric digits (21)

1.5M 179 77
openvinotoolkit
openvino

OpenVINOâ„¢ is an open source toolkit for optimizing and deploying AI inference

1.4M 10K 3K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery