PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Document Analysis Python Packages

Python packages with the GitHub topic document-analysis. Sorted by relevance, with stars and monthly downloads.
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

305K 64K 5K
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

76K 64K 5K
retab-dev
retab

The developper starter pack for document processing

11K 44 2
Alex8791-cyber
cognithor

Cognithor · Agent OS: Local-first autonomous agent operating system. 19 LLM providers, 18 channels, 145 MCP tools, 6-tier memory, Agent Packs marketplace, zero telemetry. Python 3.12+, Apache 2.0.

7K 138 21
Topdu
openocr-python

OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful reproductions of the core implementations from a wide range of academic papers.

6K 1K 127
opendatalab
mineru-selfhosted-mcp

MCP bridge for a self-hosted MinerU API

5K 64K 5K
zoharbabin
dd-agents

Find what gets buried in the data room. 13 AI agents analyze every contract across 9 domains (Legal, Finance, Commercial, Tech, Cyber, HR, Tax, Regulatory, ESG), cross-reference findings, and trace each to exact page & quote. Interactive chat, Excel/Word export, knowledge that compounds across runs.

4K 21 7
yuvaraj3855
preocr

Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.

3K 10 4
ispras
dedoc

Extract content and logical tree structure from textual documents

2K 681 56
LATIS-DocumentAI-Group
documentai-std

The main standards for Latis Document AI project

2K 3 0
tiroq
mdify-cli

Convert PDFs and document images into structured Markdown for LLM workflows

2K 0 0
AdemBoukhris457
doctra

📄🔍 Parse, extract, and analyze documents with ease 📄🔍

897 205 33
UiForm
uiform

The developper starter pack for document processing

827 44 2
lazyFrogLOL
llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

548 271 8
Retab-dev
k-llms

The developper starter pack for document processing

488 44 2
nanonets
docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

433 2K 143
xyntopia
pydoxtools

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

365 87 14
ahmetkumass
contract-analyzer

Open-source tool for extracting and analyzing key information from legal contracts and documents with ease.

349 12 1
acsenrafilho
cucaracha

Mr. Franz Cucaracha will be glad to assist you to the document analysis and processing routine

345 1 1
jiahuidegit
doc-mcp-server

让AI读懂任何复杂文档 - 解决AI上下文限制问题的通用MCP服务器 | Universal MCP server for AI to understand complex documents

227 2 0
olaflaitinen
thulium-htr

Thulium - State-of-the-Art Multilingual Handwriting Text Recognition for Python

215 8 0
ZeroBone
officialeye

An advanced AI-powered generic document-analysis tool

199 7 3
FitLayout
flclient

Python client library for the FitLayout REST API

151 0 0
opendatalab
xh-pdf-parser

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

141 64K 5K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery