PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Document Extraction Python Packages

Python packages with the GitHub topic document-extraction. Sorted by relevance, with stars and monthly downloads.
iterationlayer
iterationlayer

Official Python SDK for the Iteration Layer API — document extraction, image transformation, image generation, document generation, and sheet generation.

2K 0 0
iyulab
undoc

High-performance Rust library for extracting content from DOCX, XLSX, and PPTX into Markdown, plain text, or JSON with CJK support and .NET/Python bindings.

1K 16 3
xyntopia
pydoxtools

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

365 87 14
QuartzUnit
docpick

Lightweight OCR + Local LLM → Schema-based Structured JSON Extraction

328 0 0
docuglean-ai
docuglean-ocr

Intelligent document processing. Extract structured data like JSON, Markdown and HTML from documents using AI.

135 115 2
docuglean-ai
docuglean

An SDK for intelligent document processing using SOTA VLLM models

95 115 2
    • Data from PyPI, GitHub, ClickHouse, and BigQuery