PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Datasets Python Packages

Python packages with the GitHub topic datasets. Sorted by relevance, with stars and monthly downloads.
huggingface
datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

127.8M 22K 3K
akfamily
akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

2.8M 19K 3K
Arize-ai
arize-phoenix

AI Observability & Evaluation

2.4M 10K 882
Arize-ai
arize-phoenix-otel

AI Observability & Evaluation

1.8M 10K 882
tensorflow
tensorflow-datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

1.7M 5K 2K
Arize-ai
arize-phoenix-client

AI Observability & Evaluation

952K 10K 882
Arize-ai
arize-phoenix-evals

AI Observability & Evaluation

765K 10K 882
colour-science
colour-science

Colour Science for Python

453K 3K 289
Mozilla-Data-Collective
datacollective

Python library for easily accessing Mozilla Data Collective datasets

393K 22 6
tensorflow
tfds-nightly

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

203K 5K 2K
torchgeo
torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data

157K 4K 551
Coloquinte
torchsr

Super Resolution datasets and models in Pytorch

148K 213 22
simonw
datasette

An open source multi-tool for exploring and publishing data

123K 11K 835
HumanSignal
label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

107K 27K 4K
Nixtla
datasetsforecast

Datasets for time series forecasting

103K 126 12
mims-harvard
pytdc

Therapeutics Commons (TDC): Multimodal Foundation for Therapeutic Science

103K 1K 213
snap-stanford
ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning

101K 2K 407
ibm
unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking

77K 212 67
MinishLab
semhash

Fast Multimodal Semantic Deduplication & Filtering

73K 924 56
cleanlab
cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

62K 11K 893
JovianML
opendatasets

A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

53K 347 143
Farama-Foundation
minari

A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities

27K 511 63
autogluon
fev

Forecast evaluation library

26K 161 17
open-edge-platform
datumaro

Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.

26K 671 154
    • Data from PyPI, GitHub, ClickHouse, and BigQuery