In the late 1990s and early 2000s, the support vector machine (SVM) technique was popular for machine learning classification problems. But now, among my colleagues at least, SVMs are never used except for legacy systems.
Briefly, the five main reasons why I don’t use SVMs are:
1.) The SVM technique is extremely complex, so much so that almost all implementations (including scikit) are just wrappers over an old C++ implementation called libsvm.
2.) The SVM technique natively handles only linearly separable data for binary classification. Therefore, in practice, you must use the hacks of the kernel trick (for realistic data) and one-versus-rest (OVR, for multi-class problems).
3.) The SVM technique has many parameters that are extremely unintuitive and difficult to tune.
4.) The SVM technique doesn’t really lead to any generalizable skills, unlike neural techniques that generalize to many modern forms of machine learning (autoencoders, Transformers, etc.)
5.) For me, most importantly, the SVM technique just doesn’t feel right.
And while I’m at it and being cranky, why the term “machine” in SVM? A machine is a mechanical or electrical physical device. Wouldn’t “support vector classifier” make more sense? Or maybe “support vector machine learning”? It’s possible the term “support vector machine” was influenced by the seminal 1958 research paper “A Learning Machine: Part I” by R. Friedberg.
I put together a scikit SVM demo. I used one of my standard datasets for a multi-class example. The data looks like:
1 0.24 1 0 0 0.2950 2 -1 0.39 0 0 1 0.5120 1 1 0.63 0 1 0 0.7580 0 -1 0.36 1 0 0 0.4450 1 . . .
Each line of data represents a person. The fields are sex (male = -1, female = 1), age (normalized by dividing by 100), state (Michigan = 100, Nebraska = 010, Oklahoma = 001), annual income (divided by 100,000), and politics type (0 = conservative, 1 = moderate, 2 = liberal). The goal is to predict politics type from sex, age, state, income. There are 200 training items and 40 test items.
The SVC constructor signature is crazy:
# SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', # coef0=0.0, shrinking=True, probability=False, tol=0.001, # cache_size=200, class_weight=None, verbose=False, # max_iter=-1, decision_function_shape='ovr', # break_ties=False, random_state=None)
For my demo, I used these parameters:
params = { 'C' : 1.0, 'kernel' : 'rbf', 'gamma' : 1.0/6, 'shrinking' : True, 'probability' : True, 'tol' : 1.0e-3, 'cache_size' : 200.0, 'max_iter' : -1, 'decision_function_shape' : 'ovr' }
The results were terrible, as is often the case with a first stab using an SVM. I could have spent several hours fiddling with the parameters, but I wasn’t in the mood.
The moral of the story is that even technical fields like machine learning are subject to fads and fashion. SVMs were wildly over-hyped for several years until their weaknesses were acknowledged.
Movie actresses can influence women’s fashion. Here are three winners of the Academy Award for Best Actress. To my eye, their clothing style is subdued and elegant. Left: Janet Gaynor (1928), the first winner. Center: Audrey Hepburn (1953). Right: Faye Dunaway (1976).
Demo code below. The data can be found at https://jamesmccaffrey.wordpress.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.
# people_politics_svm_sckit.py # predict politics (0 = con, 1 = mod, 2 = lib) # from sex, age, state, income # sex age state income politics # -1 0.27 0 1 0 0.7610 2 # 1 0.19 0 0 1 0.6550 0 # sex: 0 = male, 1 = female # state: michigan = 100, nebraska = 010, oklahoma = 001 # politics: conservative, moderate, liberal # Anaconda3-2022.10 Python 3.9.13 scikit 1.0.2 # Windows 10/11 import numpy as np from sklearn.svm import SVC import warnings warnings.filterwarnings('ignore') # early-stop warnings # --------------------------------------------------------- def show_confusion(cm): dim = len(cm) mx = np.max(cm) # largest count in cm wid = len(str(mx)) + 1 # width to print fmt = "%" + str(wid) + "d" # like "%3d" for i in range(dim): print("actual ", end="") print("%3d:" % i, end="") for j in range(dim): print(fmt % cm[i][j], end="") print("") print("------------") print("predicted ", end="") for j in range(dim): print(fmt % j, end="") print("") # --------------------------------------------------------- def main(): # 0. get ready print("\nBegin scikit support vector machine example ") print("Predict politics from sex, age, State, income ") np.random.seed(1) np.set_printoptions(precision=4, suppress=True) # sex age state income politics # -1 0.27 0 1 0 0.7610 2 # 1 0.19 0 0 1 0.6550 0 # 1. load data print("\nLoading data into memory ") train_file = ".\\Data\\people_train.txt" train_xy = np.loadtxt(train_file, usecols=range(0,7), delimiter="\t", comments="#", dtype=np.float32) train_x = train_xy[:,0:6] train_y = train_xy[:,6].astype(int) test_file = ".\\Data\\people_test.txt" test_xy = np.loadtxt(test_file, usecols=range(0,7), delimiter="\t", comments="#", dtype=np.float32) test_x = test_xy[:,0:6] test_y = test_xy[:,6].astype(int) print("\nTraining data:") print(train_x[0:4]) print(". . . \n") print(train_y[0:4]) print(". . . ") # --------------------------------------------------------- # 2. create network # SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', # coef0=0.0, shrinking=True, probability=False, tol=0.001, # cache_size=200, class_weight=None, verbose=False, # max_iter=-1, decision_function_shape='ovr', # break_ties=False, random_state=None) params = { 'C' : 1.0, 'kernel' : 'rbf', 'gamma' : 1.0/6, 'shrinking' : True, 'probability' : True, 'tol' : 1.0e-3, 'cache_size' : 200.0, 'max_iter' : -1, 'decision_function_shape' : 'ovr' } print("\nCreating radial basis function SVM-SVC model ") model = SVC(**params) # --------------------------------------------------------- # 3. train print("\nTraining parameters: \n") print(params) print("\nStarting training ") model.fit(train_x, train_y) print("Done ") # 4. evaluate acc_train = model.score(train_x, train_y) print("\nAccuracy on train = %0.4f " % acc_train) acc_test = model.score(test_x, test_y) print("Accuracy on test = %0.4f " % acc_test) from sklearn.metrics import confusion_matrix y_predicteds = model.predict(test_x) cm = confusion_matrix(test_y, y_predicteds) # print("\nConfusion matrix raw: ") # print(cm) print("\nConfusion: ") show_confusion(cm) # with formatted labels # 5. use model print("\nPredict for: M 35 Nebraska $55K ") X = np.array([[-1, 0.35, 0,1,0, 0.5500]], dtype=np.float32) probs = model.predict_proba(X) print("\nPrediction pseudo-probs: ") print(probs) politic = model.predict(X) # 0,1,2 lbls = ["conservative", "moderate", "liberal"] print("\nPredicted class: ") print(lbls[politic[0]]) # 6. TODO: save model using pickle # import pickle # print("Saving trained model ") # path = ".\\Models\\svm_model.sav" # pickle.dump(model, open(path, "wb")) # load and use saved model # X = np.array([[-1, 0.35, 0,1,0, 0.5500]], # dtype=np.float32) # with open(path, 'rb') as f: # loaded_model = pickle.load(f) # pa = loaded_model.predict_proba(X) # print(pa) print("\nEnd scikit SVM demo ") if __name__ == "__main__": main()
You must be logged in to post a comment.