PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Dataset Generation Python Packages

Python packages with the GitHub topic dataset-generation. Sorted by relevance, with stars and monthly downloads.
nfstream
nfstream

NFStream: a Flexible Network Data Analysis Framework.

12K 1K 143
hearmeneigh
datasetrising

Toolchain for creating custom datasets and training Stable Diffusion (1.x, 2.x, XL) models and LoRAs

12K 18 1
lightning-rod-labs
lightningrod-ai

Python SDK for dataset generation on LightningRod platform ⚡

5K 47 3
Kiln-AI
kiln-ai

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

4K 5K 366
HZYAI
ragscore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.

2K 31 5
colddsam
modeyolo

ModeYOLO: Elevate image processing with this Python package. Seamlessly perform color space transformations, simplify dataset modification for deep learning, and leverage OpenCV and NumPy. Ideal for YOLO projects, computer vision tasks, and efficient machine learning workflows.

2K 0 0
DIYer22
bpycv

Computer vision utils for Blender.

2K 501 60
Kiln-AI
kiln-server

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

1K 5K 366
StarlangSoftware
nlptoolkit-datagenerator

Classification dataset generator library for high level Nlp tasks

1K 3 0
Antrikshy
rackfocus

Schedulable command line utility to download and compile IMDb datasets in a highly browsable SQLite database file

1K 9 1
scalexi
scalexi

The scalexi package is a versatile open-source Python library that focuses on facilitating low-code development and fine-tuning of diverse Large Language Models (LLMs). It extends beyond its initial OpenAI models integration, offering a scalable framework for various LLMs.

1K 13 2
radi-cho
datasetgpt

A command-line interface to generate textual and conversational datasets with LLMs.

982 298 19
StarlangSoftware
nlptoolkit-datagenerator-cy

Classification dataset generator library for high level Nlp tasks

970 0 0
Superuser666-Sigil
human-eval-rust

SigilDERG Data Production is an enterprise-grade Rust pipeline that crawls crates, runs rigorous scans (Clippy, Geiger, license checks), and generates instruction-style JSONL shards. It features semantic chunking, configurable splits, observability, and seamless SigilDERG ecosystem integration.

936 0 1
SimGus
chatette

A dataset generator for Rasa NLU

886 315 53
christiangarcia0311
data-seed-ph

A Python library for generating realistic, synthetic Philippine-based datasets.

821 8 0
MatteoGuadrini
pyreports

pyreports is a python library that allows you to create complex report from various sources

737 113 9
Ahmad8864
autosynth

Agentic synthetic-data generation framework inspired by Meta FAIR's Autodata / Agentic Self-Instruct.

717 0 0
facebookresearch
stopes

Large-Scale Translation Data Mining.

671 306 47
OOXXXXOO
d-arth

DATASETS FOR WHOLE E-ARTH

594 9 7
fabiobove-dr
bio-dataset-manager

This tool facilitates the encoding of these sequences into tensors, which can then be used for AI computations and complex model implementations

562 1 0
dot-css
tempdataset

A lightweight Python library for generating realistic temporary datasets for testing and development. Generate 40+ different dataset types including business, financial, IoT, healthcare, and technology data!

553 1 0
C-you-know
ks-llm-ranker

Ranking Large Language Models using the Principle of Least Action! Built during my time at Knit Space, Hubbali under the guidance Prof. Prakash Hegade.

528 5 0
johnazedo
financial-scraper

Get data of stocks and funds that compose the Brazilian stock market.

521 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery