PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Data Generation Python Packages

Python packages with the GitHub topic data-generation. Sorted by relevance, with stars and monthly downloads.
Stranger6667
hypothesis-graphql

Generate arbitrary queries matching your GraphQL schema, and use them to verify your backend implementation.

1.8M 48 4
databrickslabs
dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

287K 465 94
sdv-dev
copulas

A library to model multivariate data using copulas.

189K 645 120
sdv-dev
sdv

Synthetic data generation for tabular data

133K 3K 417
sdv-dev
ctgan

Conditional GAN for generating synthetic tabular data.

132K 2K 330
sdv-dev
deepecho

Synthetic Data Generation for mixed-type, multivariate time series.

116K 123 17
tabularis-ai
be-great

A novel approach for synthesizing tabular data using pretrained large language models

6K 361 59
sqllocks
sqllocks-spindle

Multi-domain, schema-aware synthetic data generator for Microsoft Fabric. 13 domains, billion-row scale, statistically calibrated. Lakehouse · Warehouse · SQL DB · Eventhouse writers.

6K 0 0
avsolatorio
realtabformer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

5K 243 30
DexForce
embodichain

An end-to-end, GPU-accelerated, and modular platform for building generalized Embodied Intelligence.

2K 156 15
wwhenxuan
s2generator

A series-symbol (S2) dual-modality data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic representations.

1K 18 3
rasinmuhammed
misata

High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized NumPy for deterministic, scalable generation.

1K 55 3
Mukhopadhyay
pyfake

A Flexible and Extensible fake data generator based on Pydantic models.

1K 4 0
jaehyeon-kim
dynamic-des

Real-time SimPy control plane to dynamically update parameters and stream outputs via external systems like Kafka, Redis, or Postgres. Built for event-driven digital twins.

1K 3 0
eriknovak
anonipy

The data anonymization package

1K 8 4
agonzalezla
pydni

Spanish identity utilities: validation and generation of DNI, NIE, CIF, NIF, names, emails and birthdates.

987 0 0
hypervectorio
hypervector-wrapper

Python wrapper to use the Hypervector API. Better data tests

785 9 7
Ahmad8864
autosynth

Agentic synthetic-data generation framework inspired by Meta FAIR's Autodata / Agentic Self-Instruct.

717 0 0
burning-cost
insurance-datasets

Synthetic UK motor insurance datasets with known DGP for model validation

715 0 0
0xdps
fakestack

Full-stack fake data generator - Generate databases and APIs from JSON schemas

597 1 0
apiverve
apiverve-colorpalettegenerator

Color Palette is a powerful tool for generating harmonious color palettes. Generate color schemes (mono, contrast, triade, tetrade, analogic) with accessibility data, CSS exports, and palette images.

588 0 0
apiverve
apiverve-companynamegenerator

Company Name Generator is a simple tool for generating company names. It returns a list of company names based on the specified keyword.

587 0 0
apiverve
apiverve-cardgenerator

Card Generator is a simple tool for generating test/sample card numbers. It returns a list of card numbers for testing.

584 0 0
munichpavel
fake-data-for-learning

Sample interesting fake data for machine and human learning

574 8 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery