Skip to content

Synthetic data generation

Synthetic data is data that has been created artificially through computer simulation or that algorithms can generate to take the place of real-world data. The data can be used as an alternative or supplement to real-world data when real-world data is not readily available. It can also be used as a Machine Learning performance booster.

The ydata-synthetic package is an open-source Python package developed by YDataโ€™s team that allows users to experiment with several generative models for synthetic data generation. The main goal of the package is to serve as a way for data scientists to get familiar with synthetic data and its applications in real-world domains, as well as the potential of Generative AI.

The ydata-synthetic package provides different methods for generating synthetic tabular and time-series data, such as Variational Auto Encoders (VAE), Gaussian Mixture Models (GMM), and Conditional Generative Adversarial Networks (CTGAN). The package also includes a user-friendly UI interface that guides users through the steps and inputs to generate synthetic data samples.

The package also aims to facilitate the exploration and understanding of synthetic data generation methods and their limitations.

๐Ÿ“„Get started with synthetic data for tabular data with CTGAN

๐Ÿ“ˆ Get started with synthetic data for time-series with TimeGAN