PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Pdf Parsing Python Packages

Python packages with the GitHub topic pdf-parsing. Sorted by relevance, with stars and monthly downloads.
py-pdf
pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

65.5M 10K 2K
jsvine
pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

30.9M 10K 882
py-pdf
pypdf2

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

25.2M 10K 2K
jstockwin
py-pdf-parser

A Python tool to help extracting information from structured PDFs.

25K 429 49
harubi
bolivar

High-performance PDF table extraction library. Bindings for Python and JVM.

12K 1 0
yuvaraj3855
preocr

Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.

3K 10 4
marcpinet
citracer

💬 Trace citation chains for any concept across research papers and render them as an interactive graph.

2K 20 0
IQDM
iqdmpdf

A collection of PDF data mining scripts for various IMRT QA vendors

1K 13 2
OpenDCAI
flash-mineru

Fast Inference Architecture for MinerU

465 53 7
py-pdf
pypdf-fork

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

461 10K 2K
jsvine
pdfplumber-aemc

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

287 10K 882
jeremiahbohr
literature-mapper

Transform academic PDFs into a Knowledge Graph with typed claims, temporal analysis, bibliometric tools, and grounded LLM synthesis that cites only your corpus.

260 9 3
meldonization
depdf

PDF table & paragraph extractor

228 11 0
Halolegend94
pdf4py

A PDF parser written in Python 3 with no external dependencies.

195 56 3
J-sephB-lt-n
pdf-bank-statement-parser

Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data

157 6 5
ZhuJiaxin2
ragtable-extract

PDF table extraction for RAG — convert to clean HTML. Fast, local, no GPU.

147 1 0
DQ-Zhang
refchaser

Written in python, for checking reference lists in systematic reviews and literature reviews, helps with reference list searching both backward&forward by extracting references and creating search queries, ranks articles by relevance to improve screening efficiency, download full-text pdf of research articles in batch.

138 25 2
NahomAl
ethiobank-receipts

Fast and reliable Python library to extract and verify payment receipts from major Ethiopian banks (CBE, Dashen, Awash, BOA, Zemen, Telebirr).

127 9 2
    • Data from PyPI, GitHub, ClickHouse, and BigQuery