PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Pdf Processing Python Packages

Python packages with the GitHub topic pdf-processing. Sorted by relevance, with stars and monthly downloads.
pdftl
pdftl

PDF CLI pipeline: merge, split, crop, rotate, compress, extract images, add text and more. Modern pdftk replacement, powered by pikepdf/qpdf.

2K 6 1
mo-tunn
tokenpack-rag

TokenPack packs long documents, codebases, PDFs, and folders into compact, evidence-dense LLM context using local embeddings, evidence scoring, and budget-aware selection.

905 7 0
mcagriaksoy
safepdf

SafePDF is a privacy-focused offline tool for PDF manipulation. Merge, compress, split, and organize your PDF files securely: No internet required, your documents stay local and safe.

465 6 1
fujiba
llm-pdf-chunker

LLM-friendly PDF splitter & image optimizer. Chunk PDFs by size and downsample images for RAG/Bedrock.

410 0 0
hksorensen
diagram-detector

Production-ready diagram detection for academic papers using YOLO11

394 1 0
Rekhet
revpdf

A triage-and-recovery toolkit for PDFs saved with incremental updates.

380 0 0
PSPDFKit
nutrient-dws

Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion

274 54 1
Prathamesh-Ghatole
entityxtract

An open source tool to extract entities from documents

221 1 1
MelinaNorton
journal-vetter

Python CLI & library for automated journal vetting — GPT‑4.1 summarization, YAML configuration, reproducible analysis.

207 1 0
Kubenew
pdf2struct

`pdf2struct` extracts structured JSON from PDF documents.

195 1 0
Aleptonic
pdf-snip

PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.

91 3 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery