Logistic Regression, Iteratively Reweighted Least Squares, and Newton-Raphson

Recently I was working on some logistic regression code and during my research I was a bit confused by the terminology. Logistic regression is a technique that generates a magic equation which can be used to predict an outcome that can be 0 or 1. For example, the Wikipedia entry on logistic regression has a medical example where independent variables x1=age, x2=sex (male/female), and x3=cholesterol-level are used to predict dependent variable death (0/1). Logistic regression assumes data follows an equation death = 1 / (1 + exp(-z)), where z = b0 + b1x1 + b2x2 + b3x3. If you have a set of training data the problem boils down to finding the values of b0, b1, b2, b3 that best fit your data, meaning the values that produce the least error. There are many ways to go about finding the bi values. One of the most common is called the method of iteratively reweighted least squares (IRLS). IRLS assumes that you defined error as the sum of the squared differences between the actual (in the training data) dependent variable values and the expected (by the equation) dependent variable values. Now there are several specific numerical algorithms that can be used to solve an IRLS problem. One of the most common algorithms is called the Newton-Raphson, or just Newton’s, method. Newton-Raphson involves finding the calculus derivative of a function. When used with multiple equations, like in the case of logistic regression, this involves finding the inverse of a matrix. So, to summarize, iteratively reweighted least squares is sort of a conceptual approach for finding the best parameters for logistic regression, and Newton-Raphson is a specific numeric algorithm that can be used with IRLS.

This entry was posted in Machine Learning. Bookmark the permalink.