A Support Vector Machine (SVM) is a machine learning algorithm that can do classification. SVMs were all the rage a few years ago but they’ve fallen out of favor a bit recently (deep neural networks are the current craze).
In most cases with ML algorithms, I like to code my own implementation so I have total control, and the custom implementation is typically at least one order of magnitude smaller in size than a library or tool.
However, SVMs are brutally difficult to code from scratch. The basic ideas are actually quite simple (relatively) but there are a huge number of details.
I thought I’d take a look at the svm() function in the R language. It’s really, really good.
First I created a text file with the classic Iris data set. Then I installed the weirdly-named “e1071” R package that contains the svm() function. I loaded my data into a data frame:
mydf = read.table("IrisData.txt", header=T,sep=",")
Then I created the SVM model using the default radial kernel with its default parameter values:
x = subset(mydf, select= -species) y = mydf$species mymodel = svm(x, y) # use all defaults mymodel
Here x holds the predictor values (“all but species”) and y holds the categorical values to predict (Setosa, Versicolor, Virginica). Then I evaluated the predictive accuracy of the model:
mypred = predict(mymodel, x) # generate predicted species table(mypred, y) # show predictions
Very slick. The R language is often like this: some very difficult tasks are easy to perform (but some easy tasks are very tricky to perform).