PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Dataset Creation Python Packages

Python packages with the GitHub topic dataset-creation. Sorted by relevance, with stars and monthly downloads.
ynop
audiomate

Audiomate is a library for working with audio datasets.

641 138 25
JaonHax
scpscraper

A Python library designed for scraping data from the SCP wiki.

495 16 4
tanaos
synthex

Generate high-quality, large-scale synthetic datasets ๐Ÿ“Š๐Ÿงช

345 1 1
shubhamchaudhary
aesthetics

Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader

331 231 53
michaelscutari
protclust

Python tools for protein sequence clustering and dataset splitting

250 4 0
wxCvRoot
wxcvannotator

A professional AI-assisted computer vision image annotation tool built with wxPython and OpenCV. Features a cross-platform GUI, robust C++ rendering engine support, and comprehensive multi-language (i18n) capabilities.

160 1 1
LeviBorodenko
dortmund2array

Tool to convert datasets from "Benchmark Data Sets for Graph Kernels" (K. Kersting et al., 2016) into a format suitable for deep learning research.

109 2 0
michaelscutari
mmseqspy

protclust is a Python library for protein sequence analysis that integrates MMseqs2 for fast clustering and provides tools for creating robust machine learning datasets. It offers cluster-aware data splitting to prevent sequence similarity bias in model evaluation, along with comprehensive protein embedding capabilities for feature generation.

79 4 0
bazukas
soyla

Simple terminal application to record speech datasets

68 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery