PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Pdf Extraction Python Packages

Python packages with the GitHub topic pdf-extraction. Sorted by relevance, with stars and monthly downloads.
kreuzberg-dev
kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

174K 8K 488
opendataloader-project
opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

134K 21K 2K
bzsanti
oxidize-pdf

Python bindings for oxidize-pdf — generate, parse, split, merge & manipulate PDFs with native Rust performance. No C deps, no Java, no subprocesses.

19K 0 0
pytr-org
pytr

Use TradeRepublic in terminal and mass download all documents

8K 733 143
zoharbabin
dd-agents

Find what gets buried in the data room. 13 AI agents analyze every contract across 9 domains (Legal, Finance, Commercial, Tech, Cyber, HR, Tax, Regulatory, ESG), cross-reference findings, and trace each to exact page & quote. Interactive chat, Excel/Word export, knowledge that compounds across runs.

4K 21 7
NameetP
pdfmux

PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.

3K 63 7
opendataloader-project
langchain-opendataloader-pdf

A LangChain integration for OpenDataLoader PDF

3K 33 4
ocrqueen
ocrqueen

Official Python SDK for the OCRQueen document extraction API

3K 0 0
iterationlayer
iterationlayer

Official Python SDK for the Iteration Layer API — document extraction, image transformation, image generation, document generation, and sheet generation.

2K 0 0
heleninsights-dot
phd-deepread-workflow

A professinal CLI workflow for PhD students to extract, analyze, and visualize academic papers into structured Markdown and Obsidian Canvas.

1K 38 2
madhav921
stmtforge

Open-source Python tool to parse credit card PDF statements from Indian banks (HDFC, ICICI, SBI, Axis + 5 more) into structured data. Offline, privacy-first, Streamlit dashboard. pip install stmtforge

441 2 0
Kyros-Groupe-Ltd
pdfstructx

Intelligent PDF parser with font-aware structure detection, table extraction, and multi-column support

217 0 0
GramosoftAI
gdoczai

GDocz by Gramosoft is an open-source Intelligent Document Processing platform that turns raw PDFs and images into clean, structured JSON — powered by multi-engine OCR and AI-driven schema extraction.

205 6 1
Goldziher
mseep-kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

169 8K 487
ZhuJiaxin2
ragtable-extract

PDF table extraction for RAG — convert to clean HTML. Fast, local, no GPU.

147 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery