## Logistic Regression Using PyTorch

The PyTorch code library is intended for creating neural networks but you can use it to create logistic regression models too. One approach, in a nutshell, is to create a NN with one fully connected layer that has a single node, and apply logistic sigmoid activation. You encode training labels as 0 or 1, and you use BCELoss (binary cross entropy) with SGD (stochastic gradient descent) training optimization.

I coded up a demo program to illustrate. The most difficult part is preparing the data. I used the Banknote Authentication data. It has 1372 data items. Each item represents a digital image of a banknote (think euro or dollar bill) . There are four predictor values followed by a 0 (authentic) or a 1 (forgery).

I fetched the raw data from archive.ics.uci.edu/ml/datasets/banknote+authentication. I added ID numbers from 1 to 1372 (not necessary — just to track items). Then I randomly split the data into a 1097-item set for training and a 275-item set for testing. I divided all four predictor values by 20 to normalize them so that they’d be between -1.0 and +1.0.

Next I wrote code to load data into a PyTorch Dataset. As usual, data preparationn took over 90% of the time required for the demo.

The class that defines a PyTorch logistic regression model is:

```class LogisticReg(T.nn.Module):
# a 4-1-1 simulates logistic regression using NN style
# a 4-1 architecture also works and is simpler

def __init__(self):
super(LogisticReg, self).__init__()
self.fc = T.nn.Linear(4,1)
T.nn.init.uniform_(self.fc.weight, -0.01, 0.01)
T.nn.init.uniform_(self.fc.bias, -0.01, 0.01)

def forward(self, x):
z = T.sigmoid(self.fc(x))   # logistic regression def.
return z
```

I wrote code to train the model, evaluate the accuracy of the model, save the model, and use the model to make a prediction.

The results were pretty good, with 97% accuracy on the training data and 98% accuracy on the held-out test data. but the Banknote dataset isn’t a very difficult problem.

I ran the data through a deep 4-(8-8)-1 neural network and got 99% accuracy on both training and test data. And I ran the data through a scikit LogisticRegression model and got 98% accuracy on both training and test data.

One of the possible uses for logistic regression with PyTorch is in a hyrid system where data is used to create a logistic regression model, and then those results are used in a second part of the system.

In essence, a PyTorch logistic regression model is somewhat like a scaled down neural network in disguise.

The Venice Carnival runs roughly the two weeks before Lent — the 40 days preceding Easter — usually in March and April. The Carnival has featured beautiful masks and costumes since the 12th century. Historically the masks served to disguise people which allowed them to hide their identities during naughty behavior.

Code below.

```# banknote_logreg.py
# Banknote classification using Logistic Regression
# PyTorch 1.8.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10

import numpy as np
import torch as T
device = T.device("cpu")  # apply to Tensor or Module

#        1         2         3         4         5         6
# 3456789012345678901234567890123456789012345678901234567890
# ----------------------------------------------------------
# predictors and label in same file
# archive.ics.uci.edu/ml/datasets/banknote+authentication
# IDs 0001 to 1372 added
# data has been k=20 normalized (all four columns)
# ID  variance  skewness  kurtosis  entropy  class
# [0]    [1]      [2]       [3]       [4]     [5]
#  (0 = authentic, 1 = forgery)
# train: 1097 items (80%), test: 275 items (20%)

class BanknoteDataset(T.utils.data.Dataset):

def __init__(self, src_file, num_rows=None):
usecols=range(1,6), delimiter="\t", skiprows=0,
dtype=np.float32)  # strip IDs off

self.x_data = T.tensor(all_data[:,0:4],
dtype=T.float32).to(device)
self.y_data = T.tensor(all_data[:,4],
dtype=T.float32).to(device)
self.y_data = self.y_data.reshape(-1,1)  # 2-D required

def __len__(self):
return len(self.x_data)

def __getitem__(self, idx):
preds = self.x_data[idx,:]  # idx rows, all 4 cols
lbl = self.y_data[idx,:]    # idx rows, the 1 col
sample = (preds, lbl)   # tuple approach
return sample

# ---------------------------------------------------------

def accuracy(model, ds):
# ds is a PyTorch Dataset object
n_correct = 0; n_wrong = 0

bat_size = 1
shuffle=False)

for (bix, batch) in enumerate(ldr):
X = batch[0]  # predictors
target = batch[1]  # target 0 or 1
oupt = model(X)   # computed in 0.0 to 1.0

# avoid 'target == 1.0'
if target "lt" 0.5 and oupt "lt" 0.5:
n_correct += 1
elif target "gte" 0.5 and oupt "gte" 0.5:
n_correct += 1
else:
n_wrong += 1

return (n_correct * 1.0) / (n_correct + n_wrong)

# ----------------------------------------------------------

class LogisticReg(T.nn.Module):
# a 4-1-1 simulates logistic regression using NN style
# a 4-1 architecture also works and is simpler

def __init__(self):
super(LogisticReg, self).__init__()
self.fc = T.nn.Linear(4,1)
T.nn.init.uniform_(self.fc.weight, -0.01, 0.01)
T.nn.init.uniform_(self.fc.bias, -0.01, 0.01)

def forward(self, x):
z = T.sigmoid(self.fc(x))   # logistic regression def.
return z

# ----------------------------------------------------------

def main():
# 0. get started
print("\nBanknote using PyTorch logistic regression \n")
T.manual_seed(1)
np.random.seed(1)

# 1. create Dataset and DataLoader objects
print("Creating train and test Datasets ")

train_file = ".\\Data\\banknote_k20_train.txt"
test_file = ".\\Data\\banknote_k20_test.txt"

train_ds = BanknoteDataset(train_file)  # all rows
test_ds = BanknoteDataset(test_file)

# 2. create logistic regression NN
print("Creating 4-1 binary LR-NN classifier ")
model = LogisticReg().to(device)

# 3. train network
print("\nPreparing training")

bat_size = 10
batch_size=bat_size, shuffle=True)

model.train()  # set training mode
lrn_rate = 0.01
loss_obj = T.nn.BCELoss()  # binary cross entropy
# loss_obj = T.nn.MSELoss()  # mean squared error
opt = T.optim.SGD(model.parameters(),
lr=lrn_rate)
max_epochs = 500
ep_log_interval = 50
print("Loss function: " + str(loss_obj))
print("Optimizer: SGD")
print("Learn rate: 0.01")
print("Batch size: " + str(bat_size))
print("Max epochs: " + str(max_epochs))

print("\nStarting training")
for epoch in range(0, max_epochs):
epoch_loss = 0.0            # for one full epoch
for (batch_idx, batch) in enumerate(train_ldr):
X = batch[0]         # [10,4]  inputs
Y = batch[1]         # [10,1]  targets
oupt = model(X)        # [10,1]  computeds

loss_val = loss_obj(oupt, Y)   # a tensor
epoch_loss += loss_val.item()  # accumulate
opt.step()            # update all weights

if epoch % ep_log_interval == 0:
print("epoch = %4d   loss = %0.4f" % \
(epoch, epoch_loss))
print("Done ")

# ----------------------------------------------------------

# 4. evaluate model
model.eval()
acc_train = accuracy(model, train_ds)
print("\nAccuracy on train data = %0.2f%%" % \
(acc_train * 100))
acc_test = accuracy(model, test_ds)
print("Accuracy on test data = %0.2f%%" % \
(acc_test * 100))

# 5. save model
print("\nSaving trained logistic regression model \n")
path = ".\\Models\\banknote_logreg_model.pth"
T.save(model.state_dict(), path)

# 6. make a prediction
raw_inpt = np.array([[4.4, 1.8, -5.6, 3.2]],
dtype=np.float32)
norm_inpt = raw_inpt / 20
unknown = T.tensor(norm_inpt,
dtype=T.float32).to(device)

print("Setting normalized inputs to:")
for x in norm_inpt[0]:
print("%0.3f " % x, end="")

model.eval()
raw_out = model(unknown)    # a Tensor
pred_prob = raw_out.item()  # scalar, [0.0, 1.0]

print("\nPrediction prob = %0.6f " % pred_prob)
if pred_prob "less-than" 0.5:
print("Prediction = authentic")
else:
print("Prediction = forgery")

print("\nEnd Banknote logistic regression demo")

if __name__== "__main__":
main()
```
This entry was posted in PyTorch. Bookmark the permalink.