I consider Logistic Regression (LR) to be the Hello World of machine learning (ML). In LR, the goal is to predict a value that can be only one of two things. For example, you might want to predict which of two sports teams will win in a contest of some sort, Team A Wins = 0, Team B Wins = 1, based on predictor variables like current winning percentage, average point differential, and so on.
I’ve studied LR for many years, but every time I code up an LR implementation, I discover something new. Just for hoots, I coded up an LR demo using nothing but raw Python and the NumPy library.
I got some good insights on the LR learning rule (gradient ascent derived from maximizing the log-likelihood). Also, during training, you always want to monitor error so you can detect if learning has run amok (typically due to poor choices of the value of the learning rate) or has stalled out. I noticed that when monitoring training error, the choice of using mean binary cross entropy error or mean squared error doesn’t matter in practical terms, even though binary cross entropy error is a bit more principled.
Well, my little investigation was more interesting and informative than I thought it’d be. I think I’ll tighten up my demo code a bit, and add an explanation, for a Visual Studio Magazine article in a month or two when I have some time.