Skip to content

Synthesize tabular data

Using DRAGAN to generate tabular synthetic data:

Real-world domains are often described by tabular data i.e., data that can be structured and organized in a table-like format, where features/variables are represented in columns, whereas observations correspond to the rows.

DRAGAN is a GAN variant that uses a gradient penalty to improve training stability and mitigate mode collapse:

Here’s an example of how to synthetize tabular data with DRAGAN using the Adult Census Income dataset:

from pmlb import fetch_data

from ydata_synthetic.synthesizers.regular import RegularSynthesizer
from ydata_synthetic.synthesizers import ModelParameters, TrainParameters

#Load data and define the data processor parameters
data = fetch_data('adult')
num_cols = ['age', 'fnlwgt', 'capital-gain', 'capital-loss', 'hours-per-week']
cat_cols = ['workclass','education', 'education-num', 'marital-status', 'occupation', 'relationship', 'race', 'sex',
            'native-country', 'target']

# DRAGAN training
#Defining the training parameters of DRAGAN
noise_dim = 128
dim = 128
batch_size = 500

log_step = 100
epochs = 500+1
learning_rate = 1e-5
beta_1 = 0.5
beta_2 = 0.9
models_dir = '../cache'

gan_args = ModelParameters(batch_size=batch_size,
                           betas=(beta_1, beta_2),

train_args = TrainParameters(epochs=epochs,

synth = RegularSynthesizer(modelname='dragan', model_parameters=gan_args, n_discriminator=3) = data, train_arguments = train_args, num_cols = num_cols, cat_cols = cat_cols)'adult_dragan_model.pkl')

#    Loading and sampling from a trained synthesizer    #
synthesizer = RegularSynthesizer.load('adult_dragan_model.pkl')