When I encounter a technology or concept that’s new to me, the best and worst sides of my character emerge. The good side is that I’m very persistent and will investigate the new topic until I really understand it. But this is my bad side too because I typically get obsessed and just can’t leave the new topic alone, even when the topic isn’t super important. The topic of self-organizing maps for example.
There’s a close relationship between obsession and passion.
So, this morning I set out to do an end-to-end creation of a self-organizing map (SOM), from scratch, using Python.
Conceptually, SOMs aren’t difficult to grasp, but as always, when implementing, all kinds of details pop up. Well, after a bit of work, I’m satisfied I really, really understand SOMs.
I used the UCI Digits Dataset which is 1,797 8×8 crude handwritten digits, ‘0’ through ‘9’. After creating the SOM for the data, I generated a U-Matrix. There’s a ton of not-entirely-correct information about U-Matrices on the Internet. The idea is that black areas represent similar data items and white areas indicate borders. But interpreting a U-Matrix is very subjective.
Because my source data has labels, it was possible to generate a second visualization. This second graph shows relationships between different data items. For example, the 1s (in orange) are similar to the 4s (in dark green) because those two colors are close geometrically. This makes sense because 1s and 4s have a similar vertical stroke.
When I get some free time, I’ll clean up the Python code and publish it, either here on my blog site or in Visual Studio Magazine online.