PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Information Extraction Python Packages

Python packages with the GitHub topic information-extraction. Sorted by relevance, with stars and monthly downloads.
run-llama
llama-cloud

Python SDK for OCR and document parsing in the cloud with LlamaParse

21.3M 29 7
adbar
htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

11.2M 149 30
urchade
gliner

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts)

573K 3K 273
EventRegistry
eventregistry

Python package for API access to news articles and events in the Event Registry

136K 256 57
elyase
geotext

Geotext extracts country and city mentions from text

86K 139 50
natasha
yargy

Rule-based facts extraction for Russian language

53K 332 44
philgooch
abbreviations

Python3 implementation of the Schwartz-Hearst algorithm for extracting abbreviation-definition pairs

49K 89 21
jaidevd
numerizer

A Python module to convert natural language numerics into ints and floats.

44K 233 24
PaddlePaddle
paddlenlp

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

32K 13K 3K
modelscope
adaseq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

30K 454 45
PaddlePaddle
tool-helpers

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

10K 13K 3K
jackboyla
glirel

Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)

10K 273 23
PaddlePaddle
fast-dataindex

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

10K 13K 3K
marijnkoolen
fuzzy-search

Fuzzy search modules for searching lists of words in low quality OCR and HTR text.

9K 23 1
brevia-ai
brevia

Extensible API and framework to build your Retrieval Augmented Generation (RAG) and Information Extraction (IE) applications with LLMs

6K 32 3
krlabsorg
lettucedetect

Lightweight hallucination detection framework for RAG applications

5K 573 40
vmenger
deduce

Deduce: de-identification method for Dutch medical text

5K 65 27
dpasse
extr

Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions

5K 10 0
arnebinder
pytorch-ie

PyTorch-IE: State-of-the-art Information Extraction in PyTorch

4K 77 6
huspacy
huspacy-nightly

HuSpaCy: industrial-strength Hungarian natural language processing

3K 185 18
huspacy
huspacy

HuSpaCy: industrial-strength Hungarian natural language processing

3K 185 18
zjunlp
deepke

DeepKE is a knowledge extraction toolkit for knowledge graph construction supporting low-resource, document-level and multimodal scenarios for entity, relation and attribute extraction.

2K 4K 744
lasigeBioTM
bent

Biomedical Term Annotator

2K 9 1
snipsco
snips-nlu

Snips Python library to extract meaning from text

2K 4K 505
    • Data from PyPI, GitHub, ClickHouse, and BigQuery