PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Document Ai Python Packages

Python packages with the GitHub topic document-ai. Sorted by relevance, with stars and monthly downloads.
deepdoctection
deepdoctection

A Repo For Document AI

8K 3K 191
harumiWeb
exstruct

Conversion from Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines, and autonomous Excel reading/writing by AI agents via CLI and MCP integration.

4K 145 22
deepdoctection
dd-core

A Repo For Document AI

4K 3K 191
ocrqueen
ocrqueen

Official Python SDK for the OCRQueen document extraction API

3K 0 0
deepdoctection
dd-datasets

A Repo For Document AI

3K 3K 191
nanonets
nanoindex

Agentic RAG Harness for long documents, Tree and Graph based reasoning. Cited answers down to the pixel

2K 55 5
tiroq
mdify-cli

Convert PDFs and document images into structured Markdown for LLM workflows

2K 0 0
clovaai
donut-python

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

2K 7K 560
fahmiaziz98
docvision

Production-ready document parsing with Vision Language Models

895 1 0
mo-tunn
tokenpack-rag

TokenPack packs long documents, codebases, PDFs, and folders into compact, evidence-dense LLM context using local embeddings, evidence scoring, and budget-aware selection.

887 7 0
OpenDCAI
flash-mineru

Fast Inference Architecture for MinerU

465 53 7
ChrBoebel
optical-context-mcp

MCP server that compresses OCR-heavy PDFs into dense packed images for AI agent workflows.

435 1 0
Keyvanhardani
german-ocr

High-performance German document OCR - Local & Cloud with GPU/CPU support

389 107 6
gregorymulla
grepctl

BigQuery Semantic Search Orchestrator

293 4 0
GramosoftAI
gdoczai

GDocz by Gramosoft is an open-source Intelligent Document Processing platform that turns raw PDFs and images into clean, structured JSON — powered by multi-engine OCR and AI-driven schema extraction.

205 6 1
harumiWeb
iflow-mcp-harumiweb-exstruct

Conversion from Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines, and autonomous Excel reading/writing by AI agents via CLI and MCP integration.

110 145 22
fahmiaziz98
doc-vision-parser

Python library for intelligent document parsing using Vision Language Models. Extract structured text and markdown from PDFs and images with self-correcting AI workflows. Supports OpenAI-compatible APIs.

3 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery