ydata-synthetic is the go-to Python package for synthetic data generation for tabular and time-series data. It uses the latest Generative AI models to learn the properties of real data and create realistic synthetic data. This project was created to educate the community about synthetic data and its applications in real-world domains, such as data augmentation, bias mitigation, data sharing, and privacy engineering. To learn more about Synthetic Data and its applications, check this article.
🤖 Create Realistic Synthetic Data using Generative AI Models:
ydata-syntheticsupports the state-of-the-art generative adversarial networks for data generation, namely Vanilla GAN, CGAN, WGAN, WGAN-GP, DRAGAN, Cramer GAN, CWGAN-GP, CTGAN, and TimeGAN. Learn more about the use of GANs for Synthetic Data generation.
📀 Synthetic Data Generation for Tabular and Time-Series Data: The package supports the synthesization of tabular and time-series data, covering a wide range of real-world applications. Learn how to leverage
ydata-syntheticfor tabular and time-series data.
💻 Best Generation Experience in Open Source: Including a guided UI experience for the generation of synthetic data, from reading the data to visualization of synthetic data. All served by a slick Streamlit app. Here's a quick overview – 1min
Looking for an end-to-end solution to Synthetic Data Generation?
Supported Data Types
Tabular data does not have a temporal dependence, and can be structured and organized in a table-like format, where features are represented in columns, whereas observations correspond to the rows.
Additionally, tabular data usually comprises both numeric and categorical features. Numeric features are those that encode quantitative values, whereas categorical represent qualitative measurements. Categorical features can further divided in ordinal, binary or boolean, and nominal features.
Time-series data exhibit a sequencial, temporal dependency between records, and may present a wide range of patterns and trends, including seasonality (patterns that repeat at calendar periods -- days, weeks, months -- such as holiday sales, for instance) or periodicity (patterns that repeat over time).
Supported Generative AI Models
The following architectures are currently supported:
- CGAN (Conditional GAN)
- WGAN (Wasserstein GAN)
- WGAN-GP (Wassertein GAN with Gradient Penalty)
- DRAGAN (Deep Regret Analytic GAN)
- Cramer GAN (Cramer Distance Solution to Biased Wasserstein Gradients)
- CWGAN-GP (Conditional Wassertein GAN with Gradient Penalty)
- CTGAN (Conditional Tabular GAN)
- TimeGAN (specifically for time-series data)
- DoppelGANger (specifically for time-series data)