The Taylor Series and Machine Learning

Somewhat by happenstance, I have a fairly strong background in both mathematics and machine learning. Many of my colleagues who are software developers are scrambling to get up to speed with ML. They have a fair amount of math background, due mostly to the undergraduate math classes they took in college, but they typically don’t have a solid grasp of exactly what math topics are needed/useful for ramping up with ML. The Taylor Series approximation is, without question, a key math topic for ML programming.

The Taylor Series can be used to approximate a function, f(x). The Taylor approximation has a beautifully symmetric definition that uses the first, second, third, and so on, derivatives, and also the factorial function. The equation is:

In some ML scenarios, working directly with some function f(x) is very difficult, but working with the Taylor approximation to f(x), let’s call the approximation P(x), is easier.

Every now and then, I hear my software developer colleagues talking about listing the key math topics for ML. They usually think in high level terms, like “linear algebra” – classifying according to typical college classes. The problem with this approach is that a typical college class has too much information. For example, a knowledge of the Taylor Series approximation is clearly a key math topic for ML, but a software developer needs to know when and how to use a Taylor Series approximation, but not necessarily the derivation of where the Taylor Series comes from (even though the derivation is, in my opinion, one of the prettiest in mathematics).

Don’t get me wrong – the more you know about anything the better. But for a software developer who is learning the math of ML, the derivation of the Taylor Series expansion should come later in their path to full knowledge – it’s just extra noise for beginners.

Here’s an example of approximating the function 1.0 over the square root of x plus 1.0: f(x) = (x+1)^(-1/2) using a second order Taylor series, for x = 3.1. You let c = 3 (it’s near to x). The value of f(3.1) computed directly is 0.4939. The value of f(3.1) computed using a Taylor Series with three terms is 0.4969.

By the way, the Maclaurin Series is a special, simplified case of the Taylor Series, where the value of c is always zero. The Maclaurin and Taylor series expansions pop up in several areas of ML, notably numerical optimization for ML training algorithms.

Series of steps from the fantastic 2006 movie “The Fall”

1 Response to The Taylor Series and Machine Learning

Venkatesh Gopalarathnam says:

August 30, 2017 at 9:04 am

“The problem with this approach is that a typical college class has too much information.” . Rightly said. I wish we were taught /allowed to learn only topics that we would use in our livelihood. Thanks for this post.