PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Table Extraction Python Packages

Python packages with the GitHub topic table-extraction. Sorted by relevance, with stars and monthly downloads.
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

79.5M 10K 726
jsvine
pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

30.9M 10K 882
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.8M 10K 726
kreuzberg-dev
kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

174K 8K 488
xavctn
img2table

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing

137K 863 119
harubi
bolivar

High-performance PDF table extraction library. Bindings for Python and JVM.

12K 1 0
ExtractTable
extracttable

Python library to extract tabular data from images and scanned PDFs

6K 286 35
tiroq
mdify-cli

Convert PDFs and document images into structured Markdown for LLM workflows

2K 0 0
Ganymede-Bio
gridgulp

Simplified intelligent spreadsheet ingestion framework with automatic table detection

994 12 1
monchin
tablers

A blazingly fast PDF table extraction library with python API powered by Rust

949 9 1
nanonets
docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

433 2K 143
jsvine
pdfplumber-aemc

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

287 10K 882
meldonization
depdf

PDF table & paragraph extractor

228 11 0
Kyros-Groupe-Ltd
pdfstructx

Intelligent PDF parser with font-aware structure detection, table extraction, and multi-column support

217 0 0
pymupdf
aqpymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

172 10K 726
kensho-technologies
grits-metric

GriTS metric for table extraction

169 2 0
Goldziher
mseep-kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

169 8K 487
inquilabee
tablecv

TableCV: Table extraction from images made easy.

155 11 2
ZhuJiaxin2
ragtable-extract

PDF table extraction for RAG — convert to clean HTML. Fast, local, no GPU.

147 1 0
philgooch
pdftablr

A fork of Kyle Cronan's Python 2.5 pdftable library, now updated for Python 3

146 2 0
sergiocorreia
quipucamayoc

Tools to extract information from digitized historical documents

98 33 5
    • Data from PyPI, GitHub, ClickHouse, and BigQuery