I wrote an article titled “Generating Synthetic Data Using a Variational Autoencoder with PyTorch” in the May 2021 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2021/05/06/variational-autoencoder.aspx.
A variational autoencoder (VAE) is a deep neural system that can be used to generate synthetic data. Generating synthetic data is useful when you have imbalanced training data for a particular class. For example, in a dataset of tech company employee information, you might have many male developer employees but very few female employees. You could train a VAE on the female employees and use the VAE to generate synthetic women.
I illustrated how VAEs work by using a demo program. The demo generates synthetic images of handwritten “1” digits based on the UCI Digits dataset. Each image is 8 by 8 pixel values between 0 and 16. The demo uses image data but VAEs can generate synthetic data of any kind. The demo begins by loading 389 actual “1” digit images into memory. A typical “1” digit from the training data is displayed. Next, the demo trains a VAE model using the 389 images. The demo concludes by using the trained VAE to generate a synthetic “1” image and displays its 64 numeric values and its visual representation.
VAEs are quite complex. The diagram below shows the architecture of the 64-32-[4,4]-4-32-64 VAE used in the demo program. An input image x, with 64 values between 0 and 1, is fed to the VAE. A neural layer condenses the 64-values down to 32 values. The 32 values are condensed to two tensors, each with four values. The first tensor represents the mean of the distribution of the source data. The second tensor represents the standard deviation of the distribution. For technical reasons the standard deviation is stored as the log of the variance.
Although it’s not obvious at first, the key point is that a VAE learns the distribution of its source data rather than memorizing the source data. A data distribution is just description of the data, given by its mean (average value) and standard deviation (measure of spread).
The mean and standard deviation (in the form of log-variance) are combined statistically to give a tensor with four values called the latent representation. These four values represent the core information contained in a digit image. The four values of the latent representation are expanded to 32 values, and those 32 values are expanded to 64 values called the reconstruction of the input.
The trickiest part of VAEs is training them. Most of my article explains how VAEs training works.
Although I didn’t mention it in my article, I’ve been exploring the idea of using a VAE for advanced data anaomaly detection.
Some science fiction movies generate synthetic women (without using a VAE). Left: Ava and Kyoko from “Ex Machina” (2014). An OK, but not great film. Center: “Street girl” and Mariette from “Blade Runner 2049” (2017). An excellent film. Right: “Other saloon girl” and Arlette from “Westworld” (1973). A so-so film.