PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Extract Data Python Packages

Python packages with the GitHub topic extract-data. Sorted by relevance, with stars and monthly downloads.
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

79.5M 10K 726
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.8M 10K 726
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

305K 64K 5K
meltano
meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

279K 3K 238
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

76K 64K 5K
yuanxu-li
html-table-extractor

extract data from html table

29K 88 22
opendatalab
mineru-selfhosted-mcp

MCP bridge for a self-hosted MinerU API

5K 64K 5K
ayush571995
extract-zip

Extract all files within a zip file which can also be in a zip format by simply running this script

2K 0 0
MeltanoLabs
tap-stackexchange

Singer tap for the StackExchange API

2K 3 1
AdemBoukhris457
doctra

📄🔍 Parse, extract, and analyze documents with ease 📄🔍

897 205 33
MeltanoLabs
tap-dbt

Singer Tap for dbt API v2 built with the Meltano SDK

814 12 8
Techcatchers
lyrics-extractor

Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.

564 60 18
apurvasijaria
googleplaystorescrape

Python module to extract Google Play store reviews and other information of any android app.

428 4 0
usercando
pullcite

Evidence-backed structured extraction from documents

427 1 0
umLu
tubeframes

A Python package for retrieving YouTube data, including video statistics, captions, and channel information. TubeFrames outputs results in a user-friendly pandas DataFrame format, making it ideal for data analysis workflows — especially in Jupyter Notebooks.

410 3 0
Kubenew
pdf2struct

`pdf2struct` extracts structured JSON from PDF documents.

392 1 0
brunneis
bluebird

Unofficial Python client for Twitter

310 44 14
rodricios
wxpath

wxpath - declarative web crawling with XPath; a Web Query Language (WQL)

307 111 5
izikeros
todo-extract

Extract TODO items from the text file

244 2 0
ammaryasirnaich
pyreqify

This project is a lightweight Python module designed to generate the reqirements.txt file. It streamline dependency management by automatically extracting imported modules from python or juypter files and generating there requirements.txt

202 0 0
pymupdf
aqpymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

172 10K 726
KhalidCK
db2table

Convert a sqlite db file to a simple web friendly format

151 0 1
opendatalab
xh-pdf-parser

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

141 64K 5K
opendatalab
lazyllm-magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

124 64K 5K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery