## The Logit Log-Odds Function in Machine Learning

This week I was working with the logit function, also known as the log-odds function. There are plenty of deep math explanations of the logit function, but I think most descriptions miss the main point.

The probability of an event, p, is a number between 0 and 1 that is a measure of how likely the event is. The bottom line is that a logit function result is (almost) a number between -4 and +4 that is a measure of how likely an event is. I say “almost”, because in theory a logit result can be from -infinity to +infinity, but in most situations the result is between about -4 and +4, and in the majority of those situations the result is between -2 and +2.

In other words, probability and logit values describe how likely an event is.

The definition of the logit function is

logit(p) = log(p / (1-p))

Notice that the only real information in the logit function is a probability, so logit cannot supply more information than probability. The p / 1-p term is the odds of an event. For example, if the probability of some event is 0.75, then the odds of the event are 0.75 / (1 – 0.75) = 3 / 1 or “three to one odds”. So logit is just the log of a probability expressed as odds, hence the name log-odds, which was shortened to “logit”.

Here’s what the logit function looks like (the tails go off to infinity):

So, why use the logit function at all? There are two reasons why the logit function might be used. First, because a logit value that is negative is less than 50% likely, and a logit value that is positive is more than 50% likely, logit values are easy to interpret by eye for some problems. The second reason is that, because of properties of the math log function, two logit values can sometimes be easier to compare than the two associated probabilities. I don’t really buy either reason to be honest — I prefer to use probabilities.

Final notes: the logit function is the math inverse of the logistic sigmoid function:

logistic(z) = 1.0 / (1.0 + e^-z)

The logistic sigmoid function has many uses in machine learning. And, the logistic sigmoid function is closely related to tanh, the hyperbolic tangent function, another common ML function, especially with neural networks. The relationship between logistic and tanh is:

tanh(z) = 2 * logistic(2z) – 1

logistic(z) = (tanh(z/2) + 1) / 2

In short, the logit, logistic sigmoid, and tanh functions are all related to each other and are conceptually based on probability.