Several machine learning algorithms use something called a kernel function. The ideas behind kernel functions are extremely deep so to understand kernel functions, I think it’s best to start with concrete examples.

There are several different, commonly-used, kernel functions. The most common (at least in my experience), is called the radial basis function (RBF) kernel. Expressed as an equation, the RBF is:

Here K stands for the RBF kernel. The x and x’ are vectors where x’ is a “point of reference” vector. The exp() function means “math constant e raised to a power”. The || term is Euclidean distance. The sigma is a free parameter.

OK, but what does all that mean?

Suppose x’ = (2.0, -3.0, 1.0) and x = (2.50, -3.25, 1.00). First the ||x – x’||^2 is calculated:

```= (2.50 - 2.0)^2 + (-3.25 - (-3.0))^2 + (1.00 - 1.0)^2
= (0.50)^2 + (-0.25)^2 + (0.00)^2
= 0.2500 + 0.0625 + 0.0000
= 0.3125
```

Next, divide that result by 2 * sigma squared. Suppose sigma = 1.5.

```= 0.3125 / (2 * (1.5)^2)
= 0.3125 / 4.50
= 0.0694
```

Last, take the negative of that result and apply exp():

```= exp(-0.0694)
= e^(-0.0694)  (note: e is approx. 2.71828183)
= 0.9329
```

The RBF kernel function will always return a value between 0.0 and 1.0. If x and x’ are the same, RBF gives 1.0. The further apart x and x’ get, the smaller the value of RBF, going down to, but not quite reaching 0.0. Therefore, the RBF kernel is a measure of similarity.