PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Synthetic Dataset Generation Python Packages

Python packages with the GitHub topic synthetic-dataset-generation. Sorted by relevance, with stars and monthly downloads.
openlayer-ai
openlayer

The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈

191K 16 2
bespokelabsai
bespokelabs-curator

Synthetic data curation for post-training and structured data extraction

46K 2K 141
Red-Hat-AI-Innovation-Team
sdg-hub

Synthetic Data Generation Toolkit for LLMs

11K 139 55
sparkfish
augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

8K 536 61
tabularis-ai
be-great

A novel approach for synthesizing tabular data using pretrained large language models

6K 361 59
avsolatorio
realtabformer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

5K 243 30
inductiva
inductiva

Large scale simulations made simple.

4K 41 8
firstbatchxyz
dria

Dria SDK is for building and executing synthetic data generation pipelines on Dria Knowledge Network.

3K 28 8
kontextox
datasety

CLI tool for dataset preparation: resize, align, caption, shuffle, synthetic, and mask generation.

2K 2 0
mohossam01
plotsim

Plotsim generates multi-table synthetic datasets with behavioral trajectories, correlations, and causal lags. Config-driven, deterministic, no real data required.

2K 0 0
AlejandroBeldaFernandez
calm-data-generator

CALM-Data-Generator is a comprehensive Python library for synthetic data generation with advanced features

2K 5 1
rasinmuhammed
misata

High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized NumPy for deterministic, scalable generation.

1K 55 3
datadreamer-dev
datadreamer-dev

Prompt. Generate Synthetic Data. Train & Align Models.

1K 1K 59
RajeevAtla
supercongan

GAN trained on superconductivity data

1K 5 0
Clearbox-AI
clearbox-synthetic-kit

Clearbox AI's all-in-one solution for generation and evaluation of synthetic tabular and time-series data.

1K 44 1
clugen
pyclugen

Multidimensional cluster generation in Python

1K 10 0
SidharthMacherla
unreal

A Synthetic data generator.

1K 1 1
georgeoshardo
symbac

A package for generating synthetic images of bacteria in phase contrast or fluorescence. Used for creating training data for machine learning segmentation and tracking algorithms.

916 22 11
Mohammed-Almekhlafi
historical2realtime

Transform static, tabular historical data into a live, real-time synthetic data stream.

773 2 2
alfurka
synloc

A Python package to create synthetic data from locally estimated distributions

724 3 0
AmanPriyanshu
dpsdv

Creating a Differential Privacy securing Synthetic Data Generation for tabular, relational and time series data.

489 9 2
iteal
wormpose

WormPose: Image synthesis and convolutional networks for pose estimation in C. elegans

433 56 19
laurawpaaby
educhateval

A structured pipeline and Python package for deploying and evaluating interactive LLM tutor systems in educational settings.

430 1 0
MaxvandenHoven
blenderline

A Blender pipeline for generating synthetic images of production lines

387 30 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery