One morning, I had just written a PyTorch program that used a neural autoencoder to reduce MNIST 28 by 28 digits from 784 dimensions down to 2 dimensions, so that each image could be plotted on an xy graph. It was an interesting experiment. I figured that as long as I had a lot of the data wrangling code handy, I’d try to use a variational autoencoder (VAE) for dimensionality reduction and visualization. This was a stretch because VAEs are designed to generate synthetic data, and not for dimensionality reduction.

The bottom line is that I don’t think the idea worked very well — a VAE does not appear to be well suited for dimensionality reduction for visualization. But like many things related to deep neural networks, there were as many questions raised as there were questions answered.

For my source data, I used the first 10,000 of the 60,000 MNIST training images. See https://jamesmccaffrey.wordpress.com/2021/03/15/converting-mnist-binary-files-to-text-files/

My demo VAE had a 784-400-[2,2]-2-400-784 architecture. In a preliminary try, I reduced the images down to [1,1] where the first value was the data distribution mean and the second value was the distribution log-variance. I soon realized that this approach didn’t make sense because I’d be plotting a mean against a log-variance. So, I increased the core representation to [2,2] where the first component is a mean vector with two values and the second component is a log-variance with two values. To graph the reduced form of each MNIST image, I used just the two mean values.

VAEs are tricky. They use a custom error/loss function. All examples I’ve seen on the Internet use binary cross entropy loss plus Kullback-Leibler divergence. I’m skeptical of the binary cross entropy error component and so I used mean squared loss instead.

The resulting visualization concentrated all of the images in one giant cluster, but did not produce sub-clusters for each of the ten digit types. This is a good thing for generating synthetic data because two random seed values will be near an image representation and probably produce a realistic looking synthetic image. But no sub-clustering isn’t good for visualization because no patterns emerge.

Anyway, it was an interesting experiment. In the back of my mind, I’m thinking about the idea of using a VAE for anomaly detection. Regular autoencoders are quite good at anomaly detection if you use reconstruction error. But regular autoencoders tend to overfit data so you get lots of false positive detections. VAEs tend to underfit, as this experiment showed, so a VAE anomaly detection system would likely give false negatives. My idea is to chain together a regular autoencoder with a VAE for anomaly detection. But I don’t have any of the details of the proposed model worked out in my mind.

*Artist Abdulrahman Eid created an incredibly detailed model diorama of a street in 1950s Jeddah, Saudi Arabia.*

Code below (long).

# mnist_vae_viz.py # PyTorch variational autoencoder for MNIST visualization # compress each 28x28 MNIST digit to 2 values then plot # use custom generated text MNIST rather than # the built-in torchvision MNIST # PyTorch 1.8.0-CPU Anaconda3-2020.02 Python 3.7.6 # CPU, Windows 10 import numpy as np import torch as T import matplotlib.pyplot as plt import torchvision as tv # to visualize fakes device = T.device("cpu") # ----------------------------------------------------------- class MNIST_Dataset(T.utils.data.Dataset): # for an Autoencoder (not a classifier) # assumes data has been converted to tab-delim text files: # 784 pixel values (0-255) (tab) label (0-9) # [0] [1] . . [783] [784] def __init__(self, src_file): tmp_x = np.loadtxt(src_file, usecols=range(0,784), delimiter="\t", comments="#", dtype=np.float32) tmp_y = np.loadtxt(src_file, usecols=[784], delimiter="\t", comments="#", dtype=np.int64) self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device) self.x_data /= 255.0 # normalize pixels self.y_data = T.tensor(tmp_y, dtype=T.int64).to(device) # don't normalize digit labels def __len__(self): return len(self.x_data) def __getitem__(self, idx): pixels = self.x_data[idx] label = self.y_data[idx] return (pixels, label) # ----------------------------------------------------------- class VAE(T.nn.Module): # [784-400-[2,2]-2-400-784] def __init__(self): super(VAE, self).__init__() self.fc1 = T.nn.Linear(784, 400) # no labels self.fc2a = T.nn.Linear(400, 2) # u self.fc2b = T.nn.Linear(400,2) # log-var self.fc3 = T.nn.Linear(2, 400) self.fc4 = T.nn.Linear(400, 784) def encode(self, x): # 784-400-[2,2] z = T.relu(self.fc1(x)) z1 = self.fc2a(z) # activation here ?? z2 = self.fc2b(z) return (z1, z2) # (u, log-var) def decode(self, x): # 1-400-784 z = T.relu(self.fc3(x)) z = T.sigmoid(self.fc4(z)) # in [0, 1] return z def forward(self, x): (u, logvar) = self.encode(x) stdev = T.exp(0.5 * logvar) noise = T.randn_like(stdev) z = u + (noise * stdev) # [2] oupt = self.decode(z) return (oupt, u, logvar) # ----------------------------------------------------------- def cus_loss_func(recon_x, x, u, logvar): # https://arxiv.org/abs/1312.6114 # KLD = 0.5 * sum(1 + log(sigma^2) - u^2 - sigma^2) # bce = T.nn.functional.binary_cross_entropy(recon_x, \ # x.view(-1, 784), reduction="sum") # mse = T.nn.functional.mse_loss(recon_x, x.view(-1, 784)) mse = T.nn.functional.mse_loss(recon_x, x) kld = -0.5 * T.sum(1 + logvar - u.pow(2) - \ logvar.exp()) BETA = 1.0 return mse + (BETA * kld) # ----------------------------------------------------------- def train(vae, ds, bs, me, lr, le): # train autoencoder vae with dataset ds using batch size bs, # with max epochs me, learn rate lr, log_every le data_ldr = T.utils.data.DataLoader(ds, batch_size=bs, shuffle=True) # loss_func = T.nn.MSELoss() # use custom loss opt = T.optim.SGD(vae.parameters(), lr=lr) print("Starting training") for epoch in range(0, me): for (b_idx, batch) in enumerate(data_ldr): opt.zero_grad() X = batch[0] # don't use Y labels to train recon_x, u, logvar = vae(X) loss_val = cus_loss_func(recon_x, X, u, logvar) loss_val.backward() opt.step() if epoch != 0 and epoch % le == 0: print("epoch = %6d" % epoch, end="") print(" curr batch loss = %7.4f" % \ loss_val.item(), end="") print("") # save and view sample images as sanity check num_images = 64 rinpt = T.randn(num_images, 2).to(device) with T.no_grad(): fakes = vae.decode(rinpt) fakes = fakes.view(num_images, 1, 28, 28) tv.utils.save_image(fakes, ".\\Fakes\\fakes_" + str(epoch) + ".jpg", padding=4, pad_value=1.0) # no overwrite print("Training complete ") # ----------------------------------------------------------- def main(): # 0. get started print("\nBegin MNIST VAE visualization ") T.manual_seed(1) np.random.seed(1) # 1. create Dataset object print("\nCreating MNIST Dataset ") fn = ".\\Data\\mnist_train_10000.txt" data_ds = MNIST_Dataset(fn) # 2. create and train VAE model print("\nCreating VAE \n") vae = VAE() # 784-400-[2,2]-2-400-784 vae.train() # set mode bat_size = 10 max_epochs = 40 lrn_rate = 0.01 log_every = int(max_epochs / 10) train(vae, data_ds, bat_size, max_epochs, \ lrn_rate, log_every) # 3. TODO: save trained VAE # 4. use model encoder to generate (x,y) pairs vae.eval() all_pixels = data_ds[0:10000][0] # all pixel values all_labels = data_ds[0:10000][1] with T.no_grad(): u, logvar = vae.encode(all_pixels) # mean logvar print("\nImages reduced to 2 values: ") print(u) # 5. graph the reduced-form digits in 2D print("\nPlotting reduced-dim MNIST images") plt.scatter(u[:,0], u[:,1], c=all_labels, edgecolor='none', alpha=0.9, cmap=plt.cm.get_cmap('nipy_spectral', 11), s=20) # s=20 orig, alpha=0.9 plt.xlabel('mean[0]') plt.ylabel('mean[1]') plt.colorbar() plt.show() print("\nEnd MNIST VAE visualization") # ----------------------------------------------------------- if __name__ == "__main__": main()

You must be logged in to post a comment.