I was hosting a technical talk recently and had a few minutes between sessions. So I challenged myself to give a blitz talk, in under three minutes, to explain k-NN classification.
I started by saying that k-NN is one of many classification algorithms, and arguably the simplest, but one that a surprising number of people don’t fully understand.
Next, I pulled up a graph to explain how the algorithm works. In the graph there were 33 data points that were one of three colors (red, yellow, green) representing three classes to predict based on two predictor variables (the x0 and x1 coordinates in the graph). The graph also had a single blue dot at (x0 = 5.25, x1 = 1.75) as an unknown to classify.
In k-NN you pick k — suppose it’s k = 4. Then, you find the 4 nearest neighbor points to the unknown point. And then you use some sort of voting mechanism, usually majority-rule) to predict the class. In the diagram, the blue dot was closest to one red dot, two yellow dots, and one green dot, so the prediction is class “yellow”.
I finished my mini-micro-talk by pointing out a few pros and cons of k-NN classification. Pros: very simple, can easily deal with any number of possible classes, can handle very bizarre data patterns, there’s only one parameter to tune (the value for k), results are somewhat interpretable. Cons: works well only when all predictor variables are numeric (because you must compute distance), ties can easily occur, doesn’t scale well to huge training datasets
I asked someone to time me, and I finished the talk in 2 minutes and 37 seconds. It was a fun challenge for me.