Binary Classification Using PyTorch: Preparing Data

I wrote an article titled “Binary Classification Using PyTorch: Preparing Data” in the October 2020 edition of the online Microsoft Visual Studio Magazine. See

The article is the first in a four-part series that presents a complete end-to-end example of how to do binary classification using PyTorch. This topic illustrates one benefit of an online article vs. a print article. When I write for a print magazine, there are always severe limitations on page length, because print is very expensive. This results in articles having to take many shortcuts and sometimes important details must be omitted. In an online article I can go into detail when necessary.

Left: This is a screenshot of the entire binary classification program that is the subject of the four articles. Right: This is a screenshot of a test of the code that implements a Dataset and uses it in a DataLoader to serve up batches of data.

The running demo I use through this article and the next three is the well-known Banknote Authentication problem. The dataset has 1,372 data items. Each item represents a banknote (think euro or dollar bill) which is either authentic (class = 0) or a forgery (class = 1). There are four numeric predictor values that were obtained from a digital image of each banknote. The raw data looks like:

3.6216, 8.6661, -2.8073, -0.44699, 0
4.5459, 8.1674, -2.4586, -1.46210, 0
. . .
-2.5419, -0.65804, 2.6842, 1.1952, 1

In this first article, I explain in detail how to implement a PyTorch Dataset object and then how to use it in a DataLoader object. A Dataset object stores all the training data, and a DataLoader object uses the Dataset to serve up batches of training data. Writing code to serve up batches of training data isn’t the most interesting task in machine learning, but it’s obviously a critcal part of creating an ML prediction model.

Before the invention of digital photography, creating a fake photograph was very difficult. But now anyone with basic Photoshop skills can make very authentic-looking forgeries. But some forgeries are easy to spot. Left: I seriously doubt this squirrel-gorilla hybrid exists in nature. Creepy. Center: It looks like someone cruelly put a hideous digital clown dress on former First Lady Michelle Obama. Very mean. Right: I don’t think bears are this friendly in nature. But they might wear hats.

This entry was posted in PyTorch. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s