PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Dataset Python Packages

Python packages with the GitHub topic dataset. Sorted by relevance, with stars and monthly downloads.
joke2k
faker

Faker is a Python package that generates fake data for you.

71.9M 19K 2K
tensorflow
tensorflow-io-gcs-filesystem

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO

6.6M 736 309
ashvardanian
stringzilla

Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops 🦖

4.9M 3K 125
tensorflow
tensorflow-datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

1.7M 5K 2K
pytorch
torchtext

Models, data loaders and abstractions for language processing, powered by PyTorch

980K 4K 809
smarie
pytest-cases

Separate test code from test cases in pytest.

945K 375 41
pydata
pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.

855K 3K 693
allenai
ir-datasets

Provides a common interface to many IR ranking datasets.

645K 390 52
mosaicml
mosaicml-streaming

A Data Streaming Library for Efficient Neural Network Training

487K 2K 191
colour-science
colour-science

Colour Science for Python

453K 3K 289
fastai
fastdownload

Easily download, verify, and extract archives

448K 47 12
tensorflow
tensorflow-io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO

416K 736 309
quandl
quandl

Package for quandl API access

291K 1K 339
tensorflow
tfds-nightly

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

203K 5K 2K
scipp
scipp

Multi-dimensional data arrays with labeled dimensions

139K 143 22
palewire
cpi

Quickly adjust U.S. dollars for inflation using the Consumer Price Index (CPI)

111K 142 23
HumanSignal
label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

107K 27K 4K
tensorflow
tensorflow-io-nightly

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO

102K 736 309
cvat-ai
cvat-sdk

Computer Vision Annotation Tool (CVAT) is a leading platform for building high-quality visual datasets for vision AI. It offers open-source, cloud, and enterprise products, as well as labeling services, for image, video, and 3D annotation with AI-assisted labeling, quality assurance, team collaboration, analytics, and developer APIs.

89K 16K 4K
mlmed
torchxrayvision

TorchXRayVision: A library of chest X-ray datasets and models. Classifiers, segmentation, and autoencoders.

85K 1K 249
datalad
datalad

Keep code, data, containers under control with git and git-annex

77K 638 128
joke2k
fake-factory

Faker is a Python package that generates fake data for you.

71K 19K 2K
rom1504
img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

61K 4K 374
neuspell
neuspell

NeuSpell: A Neural Spelling Correction Toolkit

51K 711 106
    • Data from PyPI, GitHub, ClickHouse, and BigQuery