PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Pdf Parser Python Packages

Python packages with the GitHub topic pdf-parser. Sorted by relevance, with stars and monthly downloads.
py-pdf
pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

65.5M 10K 2K
py-pdf
pypdf2

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

25.2M 10K 2K
PaddlePaddle
paddleocr

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

2.2M 78K 10K
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

305K 64K 5K
opendataloader-project
opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

134K 21K 2K
yfedoseev
pdf-oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

127K 756 82
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

76K 64K 5K
run-llama
liteparse

A fast, helpful, and open-source document parser

37K 5K 341
yobix-ai
extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

34K 2K 95
codereverser
casparser

Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech

20K 194 81
bzsanti
oxidize-pdf

Python bindings for oxidize-pdf — generate, parse, split, merge & manipulate PDFs with native Rust performance. No C deps, no Java, no subprocesses.

19K 0 0
opendatalab
mineru-selfhosted-mcp

MCP bridge for a self-hosted MinerU API

5K 64K 5K
yfedoseev
pdf-oxide-fips

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

4K 756 82
titipata
scipdf-parser

Python PDF parser for scientific publications: content and figures

3K 452 66
opendataloader-project
langchain-opendataloader-pdf

A LangChain integration for OpenDataLoader PDF

3K 33 4
ispras
dedoc

Extract content and logical tree structure from textual documents

2K 681 56
iamarunbrahma
vision-parse

Parse PDFs into markdown using Vision LLMs

2K 475 66
michelcrypt4d4mus
pdfalyzer

Analyze PDFs with colors (and YARA)

2K 369 25
nanonets
docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

2K 1K 132
axzml
pdfmark-ai

Convert PDF files to high-quality Markdown using LLM vision models

1K 1 0
ashutoshvarma
pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

1K 44 17
ozefe
ytcc-pipeline

A synchronous Python library that converts an academic-thesis PDF into a structured JSON document plus a tar bundle of cropped figures, tables, and formulas.

1K 0 0
ENDEVSOLS
longparser

Privacy-first document intelligence engine — parse PDFs, DOCX, PPTX, XLSX & CSV into AI-ready chunks for RAG pipelines. Includes HITL review, 3-layer memory chat, and a production FastAPI server.

1K 26 2
AdemBoukhris457
doctra

📄🔍 Parse, extract, and analyze documents with ease 📄🔍

897 205 33
    • Data from PyPI, GitHub, ClickHouse, and BigQuery