How to Create and Use a PyTorch DataLoader

I wrote an article titled “How to Create and Use a PyTorch DataLoader” in the September 2020 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2020/09/10/pytorch-dataloader.aspx.

In order to train a PyTorch neural network you must write code to read training data into memory, convert the data to PyTorch tensors, and serve the data up in batches. This task is not trivial.

In the early days of PyTorch, you had to write completely custom code for data loading. Now however, the vast majority of PyTorch systems use the PyTorch Dataset and DataLoader interfaces to serve up training or test data. Briefly, a Dataset object loads training or test data into memory, and a DataLoader object fetches data from a Dataset and serves the data up in batches.

in my article I present and explain a complete example of creating and using a PyTorch Dataset + DataLoader. I use a tiny 8-item dummy set of data to keep the main ideas clear:

1 0  0.171429  1 0 0  0.966805  0
0 1  0.085714  0 1 0  0.188797  1
1 0  0.000000  0 0 1  0.690871  2
1 0  0.057143  0 1 0  1.000000  1
0 1  1.000000  0 0 1  0.016598  2
1 0  0.171429  1 0 0  0.802905  0
0 1  0.171429  1 0 0  0.966805  1
1 0  0.257143  0 1 0  0.329876  0

In my article I point out that once you know how to create a custom PyTorch Dataset + DataLoader for any kind of data, you also gain insights into how to use the many pre-built Dataset + Dataloader such as MNIST (handwritten digits), CIFAR-10 (small images of 10 different objects), IMDB (movie reviews), and so on.

Three more or less random images from an Internet search for “data loader”. I have no idea why the first two images were given as results, but the third image looks like MNIST data.