PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Pdf Python Packages

Python packages with the GitHub topic pdf. Sorted by relevance, with stars and monthly downloads.
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

79.5M 10K 726
py-pdf
pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

65.5M 10K 2K
pdfminer
pdfminer-six

Community maintained fork of pdfminer - we fathom PDF

45.6M 7K 1K
pypdfium2-team
pypdfium2

Python bindings to PDFium, reasonably cross-platform.

42.9M 767 44
jsvine
pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

30.9M 10K 882
Kozea
weasyprint

The awesome document factory

26.3M 9K 815
py-pdf
pypdf2

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

25.2M 10K 2K
CourtBouillon
pydyf

A low-level PDF creator

22.2M 147 11
lukasschwab
arxiv

Python wrapper for the arXiv API

16.3M 2K 153
Belval
pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object

16M 2K 212
Kozea
cairosvg

Convert your vector images

13.8M 924 160
py-pdf
fpdf2

Simple PDF generation for Python

10.7M 2K 345
JessicaTegner
pypandoc-binary

Thin wrapper for "pandoc" (MIT)

8.4M 1K 120
pikepdf
pikepdf

A Python library for reading and writing PDF, powered by QPDF

8M 3K 223
MatthiasValvekens
pyhanko

pyHanko: sign and stamp PDF files

7.6M 720 102
docling-project
docling

Get your documents ready for gen AI

7.2M 60K 4K
microsoft
markitdown

Python tool for converting files and office documents to Markdown.

6.1M 124K 8K
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.4M 15K 1K
MatthiasValvekens
pyhanko-certvalidator

pyHanko: sign and stamp PDF files

5.1M 720 102
JessicaTegner
pypandoc

Thin wrapper for "pandoc" (MIT)

4.9M 1K 120
deeplook
svglib

Read SVG files and convert them to other formats.

4.8M 363 86
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.8M 10K 726
xhtml2pdf
xhtml2pdf

A library for converting HTML into PDFs using ReportLab

3.9M 2K 656
chezou
tabula-py

Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame

2.8M 2K 303
    • Data from PyPI, GitHub, ClickHouse, and BigQuery