Synthesize tabular data
Outdated
Note that this example won't work with the latest version of ydata-synthetic
.
Please check ydata-sdk
to see how to generate synthetic data.
Using GMMs to generate tabular synthetic data:
Real-world domains are often described by tabular data i.e., data that can be structured and organized in a table-like format, where features/variables are represented in columns, whereas observations correspond to the rows.
Gaussian Mixture models (GMMs) are a type of probabilistic models. Probabilistic models can also be leveraged to generate synthetic data. Particularly, the way GMMs are able to generate synthetic data, is by learning the original data distribution while fitting it to a mixture of Gaussian distributions.
- 📑 Blogpost: Generate synthetic data with Gaussian Mixture models
- Google Colab: Generate Adult census data with GMM