PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Data Python Packages

Python packages with the GitHub topic data. Sorted by relevance, with stars and monthly downloads.
fatiando
pooch

A friend to fetch your data files

19.1M 720 87
mahmoud
glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️

17.2M 2K 72
PrefectHQ
prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

14.2M 22K 2K
kayak
pypika

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

13.4M 3K 330
run-llama
llama-index

LlamaIndex is the leading document agent and OCR platform

12.2M 50K 7K
run-llama
llama-index-core

LlamaIndex is the leading document agent and OCR platform

9.1M 50K 7K
PrefectHQ
prefect-aws

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

6.9M 22K 2K
dlt-hub
dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

5.7M 5K 508
run-llama
llama-index-instrumentation

LlamaIndex is the leading document agent and OCR platform

4.2M 50K 7K
thombashi
dataproperty

A Python library for extract property from data.

3.3M 17 5
akfamily
akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

2.8M 19K 3K
capitalone
datacompy

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

2.6M 643 160
iterative
dvc-data

DVC's data management subsystem

2.5M 18 28
run-llama
llama-index-legacy

LlamaIndex is the leading document agent and OCR platform

1.8M 50K 7K
PrefectHQ
prefect-docker

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

1.8M 22K 2K
tensorflow
tensorflow-datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

1.7M 5K 2K
lk-geimfari
mimesis

Mimesis is a fast Python library for generating fake data in multiple languages.

1.7M 5K 359
PrefectHQ
prefect-gcp

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

1.6M 22K 2K
octoenergy
tentaclio

Single repository regrouping IO connectors used in the data world.

1.6M 30 2
foxglove
mcap

MCAP is a modular, performant, and serialization-agnostic container file format, useful for pub/sub and robotics applications.

1.5M 941 205
PrefectHQ
prefect-dbt

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

1.3M 22K 2K
PrefectHQ
prefect-ray

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

1.2M 22K 2K
datafold
collate-data-diff

Compare tables within or across databases

974K 3K 305
smarie
pytest-cases

Separate test code from test cases in pytest.

945K 375 41
    • Data from PyPI, GitHub, ClickHouse, and BigQuery