Logistic Regression Using PyTorch With L-BFGS Optimization

The PyTorch code library was designed to enable the creation of deep neural networks. But you can use PyTorch to create simple logistic regression models too. Logisitic regression models predict one of two possible discrete values, such as the sex of a person (male or female).

Training a neural network is the process of finding good values for the weights and biases, which are constants like -1.2345, that define the behavior of the network. By far the most common way to train a neural network is to use stochastic gradient descent combined with either MSE (mean squared error) or BCE (binary cross entropy) loss. If you create a logistic regression model using PyTorch, you can treat the model as a highly simplified neural network and train the logistic regression model using stochastic gradient descent (SGD). But it’s also possible to train a PyTorch logistic regression model using an old technique called L-BFGS.

The advantages and disadvantages of using SGD are: works with simple or complex neural architectures, can train in batches which allows very large datasets, but SGD requires tuning the learning rate and batch size parameters, which can be difficult and time consuming.

The advantages and disadvantages of L-BFGS are: converges in very few iterations and so is blazingly fast, parameter tuning usually not necessary, but all data must be stored in memory so L-BFGS doesn’t work with very large datasets (there are some complex work-arounds to this however).

I set out to extend my knowledge of PyTorch by creating a logistic regression model and training it using L-BFGS. There are severral differences between using SGD and using L-BFGS. The most important difference is that to use L-BFGS you must define a closure() function. Loosely speaking, a closure() function is a function defined inside another function. The closure() function computes the loss and is used by L-BFGS to update model weights and biases. It would have taken me many hours to figure this out by myself but luckily the PyTorch documentation had an example code fragment that put me on the right path.

I wrote a demo program. Here is the key code that trains the logistic regression model:

def train(log_reg, ds, bs, mi):
  # model, dataset, batch_size (must be all), max iterations
  loss_func = T.nn.BCELoss()  # binary cross entropy
  opt = T.optim.LBFGS(log_reg.parameters(), max_iter=mi)
  train_ldr = T.utils.data.DataLoader(ds,
    batch_size=bs, shuffle=False)  # shuffle irrelevant

  print("\nStarting L-BFGS training")

  for itr in range(0, mi):
    itr_loss = 0.0            # for one iteration
    for (_, all_data) in enumerate(train_ldr):  # b_ix irrelevant
      X = all_data['predictors']  # all inputs
      Y = all_data['sex']         # all targets

      # -------------------------------------------
      def closure():
        opt.zero_grad()
        oupt = log_reg(X)
        loss_val = loss_func(oupt, Y)
        loss_val.backward()
        return loss_val
      # -------------------------------------------

      opt.step(closure)  # get loss, use to update wts
     
      oupt = log_reg(X)  # monitor loss
      loss_val = closure() 
      itr_loss += loss_val.item()  
    print("iteration = %4d   loss = %0.4f" % (itr, itr_loss))

  print("Done ")

There is a lot going on here. L-BFGS uses gradients but in a different way from SGD and so you don’t have to deal with setting the eval() and train() modes. There are other differences too, so if you want to use L-BFGS yourself, be prepared to spend a few hours with the PyTorch documentation.

Naming the local function closure() isn’t very descriptive — perhaps loss_closure() would be better — but the PyTorch documentation used “closure()” so I used that name too.

My demo program creates a model that predicts the sex of a hospital patient based on their age, county of residence (one of three), blood monocyte count, and hospitalization history (minor, moderate, major). The prediction accuracy results of a model trained with L-BFGS were about the same as the best results I got on the model trained using SGD, but I had to spend quite some time tuning the SGD-trained model whereas the model trained using L-BFGS gave pretty good results immediately.

My conclusion: In scenarios where you create a logistic regression model using PyTorch, if your training data can fit into memory, using L-BFGS instead of SGD is a good approach. There are many small differences when using L-BFGS. For example, because you use the entire training data instead of batches, the shuffle parameter in DataLoader can be set to True.


Left: The shuffle dance is a joyful style that was invented in Australia. It reminds me a bit of Irish clog dancing. Here are two girls who shuffle dance up a set of stairs in unison. Very cool.

Center: Shuffle Master machines dominate the automatic card shuffling market. I’ve seen the inner workings of these machines and they’re quite remarkable.

Right: When I worked on a cruise ship as an Assistant Cruise Director years ago, one of my duties was to organize and referee the daily shuffleboard tournament. It was very popular with passengers. I’m wearing the flashy red pants and concentrating — the participants took the game seriously and were often very competitive (in a good way).


Code below (very long)

# patients_sex_logreg.py
# Logistic Regression using PyTorch with L-BFGS optimization
# predict sex from age, county, monoctye, history
# PyTorch 1.8.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10 

import numpy as np
import torch as T
device = T.device("cpu")  # apply to Tensor or Module

# ----------------------------------------------------------

class PatientDataset(T.utils.data.Dataset):
  # sex age   county    monocyte  hospitalization history
  # sex: 0 = male, 1 = female
  # county: austin, bailey, carson
  # history: minor, moderate, major

  def __init__(self, src_file, num_rows=None):
    all_data = np.loadtxt(src_file, max_rows=num_rows,
      usecols=range(0,9), delimiter="\t", skiprows=0,
      comments="#", dtype=np.float32)  # read all 9 columns

    self.x_data = T.tensor(all_data[:,1:9],
      dtype=T.float32).to(device)
    self.y_data = T.tensor(all_data[:,0],
      dtype=T.float32).to(device)

    self.y_data = self.y_data.reshape(-1,1)  # 2D required

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx,:]  # idx rows, all 8 cols
    sex = self.y_data[idx,:]    # idx rows, the only col
    sample = { 'predictors' : preds, 'sex' : sex }
    return sample

# ---------------------------------------------------------

def accuracy(model, ds):
  # ds is a iterable Dataset of Tensors
  n_correct = 0; n_wrong = 0

  for i in range(len(ds)):
    inpts = ds[i]['predictors']
    target = ds[i]['sex']    # float32  [0.0] or [1.0]
    with T.no_grad():
      oupt = model(inpts)

    # avoid 'target == 1.0'
    if target "lt" 0.5 and oupt "lt" 0.5:  # .item() not needed
      n_correct += 1
    elif target "gte" 0.5 and oupt "gte" 0.5:
      n_correct += 1
    else:
      n_wrong += 1

  return (n_correct * 1.0) / (n_correct + n_wrong)

# ---------------------------------------------------------

class LogisticReg(T.nn.Module):
  def __init__(self):
    super(LogisticReg, self).__init__()
    self.fc = T.nn.Linear(8, 1)

    T.nn.init.uniform_(self.fc.weight, -0.01, 0.01) 
    T.nn.init.zeros_(self.fc.bias)

  def forward(self, x):
    z = self.fc(x)
    p = T.sigmoid(z) 
    return p

# ----------------------------------------------------------

def train(log_reg, ds, bs, mi):
  loss_func = T.nn.BCELoss()  # binary cross entropy
  opt = T.optim.LBFGS(log_reg.parameters(), max_iter=mi)
  train_ldr = T.utils.data.DataLoader(ds,
    batch_size=bs, shuffle=False)  # shuffle irrelevant

  print("\nStarting L-BFGS training")

  for itr in range(0, mi):
    itr_loss = 0.0            # for one iteration
    for (_, all_data) in enumerate(train_ldr):  # b_ix irrelevant
      X = all_data['predictors']  # [10,8]  inputs
      Y = all_data['sex']         # [10,1]  targets

      # -------------------------------------------
      def closure():
        opt.zero_grad()
        oupt = log_reg(X)
        loss_val = loss_func(oupt, Y)
        loss_val.backward()
        return loss_val
      # -------------------------------------------

      opt.step(closure)  # get loss, use to update wts
     
      oupt = log_reg(X)  # monitor loss
      loss_val = closure() 
      itr_loss += loss_val.item()  
    print("iteration = %4d   loss = %0.4f" % (itr, itr_loss))

  print("Done ")

# ----------------------------------------------------------

def main():
  # 0. get started
  print("\nPatient gender logisitic regression L-BFGS PyTorch ")
  print("Predict gender from age, county, monocyte, history")
  T.manual_seed(1)
  np.random.seed(1)

  # 1. create Dataset and DataLoader objects
  print("\nCreating Patient train and test Datasets ")

  train_file = ".\\Data\\patients_train.txt"
  test_file = ".\\Data\\patients_test.txt"

  train_ds = PatientDataset(train_file)  # read all rows
  test_ds = PatientDataset(test_file)

  # 2. create model
  print("Creating 8-1 logistic regression model ")
  log_reg = LogisticReg().to(device)

  # 3. train network
  print("\nPreparing L-BFGS training")
  bat_size = len(train_ds)  # use all
  max_iterations = 4
  print("Loss function: BCELoss ")
  print("Optimizer: L-BFGS ")
  train(log_reg, train_ds, bat_size, max_iterations)

# ----------------------------------------------------------

  # 4. evaluate model
  acc_train = accuracy(log_reg, train_ds)
  print("\nAccuracy on train data = %0.2f%%" % \
    (acc_train * 100))
  acc_test = accuracy(log_reg, test_ds)
  print("Accuracy on test data = %0.2f%%" % \
    (acc_test * 100))

  # 5. examine model
  wts = log_reg.fc.weight
  print("\nModel weights: ")
  print(wts.data)
  bias = log_reg.fc.bias
  print("Model bias: ")
  print(bias.data)

  # 6. save model
  # print("\nSaving trained model state_dict \n")
  # path = ".\\Models\\patients_LR_model.pth"
  # T.save(log_reg.state_dict(), path)

  # 7. make a prediction 
  print("Predicting sex for age = 30, county = carson, ")
  print("monocyte count = 0.4000, ")
  print("hospitization history = moderate ")
  inpt = np.array([[0.30, 0,0,1, 0.40, 0,1,0]],
    dtype=np.float32)
  inpt = T.tensor(inpt, dtype=T.float32).to(device)

  with T.no_grad():
    oupt = log_reg(inpt)    # a Tensor
  pred_prob = oupt.item()   # scalar, [0.0, 1.0]
  print("\nComputed output: ", end="")
  print("%0.4f" % pred_prob)

  if pred_prob "less-than" 0.5:  # replace
    print("Prediction = male")
  else:
    print("Prediction = female")

  print("\nEnd Patient gender demo")

if __name__== "__main__":
  main()
This entry was posted in PyTorch. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s