Skip to content

Synthesize time-series data

Outdated

Note that this example won't work with the latest version of ydata-synthetic.

Please check ydata-sdk to see how to generate synthetic time-series data.

Why YData Fabric vs TimeGAN for time-series data

YData Fabric offers advanced capabilities for time-series synthetic data generation, surpassing TimeGAN in terms of flexibility, scalability, and ease of use. With YData Fabric, users can generate high-quality synthetic time-series data while benefiting from built-in data profiling tools that ensure the integrity and consistency of the data. Unlike TimeGAN, which is a single model for time-series, YData Fabric offers a solution that is suitable for different types of datasets and behaviours. Additionally, YData Fabric is designed for scalability, enabling seamless handling of large, complex time-series datasets. Its guided UI makes it easy to adapt to different time-series scenarios, from healthcare to financial data, making it a more comprehensive and flexible solution for time-series data generation.

For more on YData Fabric vs Synthetic data generation with TimeGAN read this blogpost.

Using TimeGAN to generate synthetic time-series data

Although tabular data may be the most frequently discussed type of data, a great number of real-world domains โ€” from traffic and daily trajectories to stock prices and energy consumption patterns โ€” produce time-series data which introduces several aspects of complexity to synthetic data generation.

Time-series data is structured sequentially, with observations ordered chronologically based on their associated timestamps or time intervals. It explicitly incorporates the temporal aspect, allowing for the analysis of trends, seasonality, and other dependencies over time.

TimeGAN is a model that uses a Generative Adversarial Network (GAN) framework to generate synthetic time series data by learning the underlying temporal dependencies and characteristics of the original data:

Hereโ€™s an example of how to synthetize time-series data with TimeGAN using the Yahoo Stock Price dataset:

"""
    TimeGAN architecture example file
"""

# Importing necessary libraries
from os import path
from ydata_synthetic.synthesizers.timeseries import TimeSeriesSynthesizer
from ydata_synthetic.preprocessing.timeseries import processed_stock
from ydata_synthetic.synthesizers import ModelParameters, TrainParameters
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Define model parameters
gan_args = ModelParameters(batch_size=128,
                           lr=5e-4,
                           noise_dim=32,
                           layers_dim=128,
                           latent_dim=24,
                           gamma=1)

train_args = TrainParameters(epochs=50000,
                             sequence_length=24,
                             number_sequences=6)

# Read the data
stock_data = pd.read_csv("../../data/stock_data.csv")
cols = list(stock_data.columns)

# Training the TimeGAN synthesizer
if path.exists('synthesizer_stock.pkl'):
    synth = TimeSeriesSynthesizer.load('synthesizer_stock.pkl')
else:
    synth = TimeSeriesSynthesizer(modelname='timegan', model_parameters=gan_args)
    synth.fit(stock_data, train_args, num_cols=cols)
    synth.save('synthesizer_stock.pkl')

# Generating new synthetic samples
stock_data_blocks = processed_stock(path='../../data/stock_data.csv', seq_len=24)
synth_data = synth.sample(n_samples=len(stock_data_blocks))
print(synth_data[0].shape)

# Plotting some generated samples. Both Synthetic and Original data are still standartized with values between [0,1]
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(15, 10))
axes=axes.flatten()

time = list(range(1,25))
obs = np.random.randint(len(stock_data_blocks))

for j, col in enumerate(cols):
    df = pd.DataFrame({'Real': stock_data_blocks[obs][:, j],
                   'Synthetic': synth_data[obs].iloc[:, j]})
    df.plot(ax=axes[j],
            title = col,
            secondary_y='Synthetic data', style=['-', '--'])
fig.tight_layout()