I wrote an article titled “Generating Synthetic Data Using a Generative Adversarial Network (GAN) with PyTorch” in the June 2021 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2021/06/02/gan-pytorch.aspx.
A generative adversarial network (GAN) is a deep neural system that can be used to generate synthetic data. GANs are somewhat similar to variational autoencoders (VAEs) in the sense that both systems generate synthetic data, but GANs are significantly more complex than VAEs.
Generating synthetic data is useful in several machine learning scenarios. One use case is when you have imbalanced training data for a particular class. For example, in a dataset of elementary school teacher information, you might have many females but very few males. You could train a GAN on the male employees and then use the GAN to generate synthetic male data items.
In the article I present a complete example where the source data consists of crude handwritten ‘2’ digits from the UCI Digits dataset. Each image is 8×8 pixels, where each pixel is a grayscale value between 0 and 16.
A GAN consists of two separate deep neural networks. A Generator create synthetic data items. A Discrimiator classifies an image as fake (from the Generator) or real (from the training data). The GAN system alternates between updating the Generator so that is produces better fake images (meaning more likely to fool the Discriminator) and updating the Discriminator so that it is better at distinguishing fake images from real images.
Although generating synthetic data items is useful and interesting, I suspect that GANs can be used for other purposes, such as anomaly detection.