Five Reasons Why I Don’t Use the scikit Support Vector Machine Classifier Module

In the late 1990s and early 2000s, the support vector machine (SVM) technique was popular for machine learning classification problems. But now, among my colleagues at least, SVMs are never used except for legacy systems.

Briefly, the five main reasons why I don’t use SVMs are:

1.) The SVM technique is extremely complex, so much so that almost all implementations (including scikit) are just wrappers over an old C++ implementation called libsvm.

2.) The SVM technique natively handles only linearly separable data for binary classification. Therefore, in practice, you must use the hacks of the kernel trick (for realistic data) and one-versus-rest (OVR, for multi-class problems).

3.) The SVM technique has many parameters that are extremely unintuitive and difficult to tune.

4.) The SVM technique doesn’t really lead to any generalizable skills, unlike neural techniques that generalize to many modern forms of machine learning (autoencoders, Transformers, etc.)

5.) For me, most importantly, the SVM technique just doesn’t feel right.

And while I’m at it and being cranky, why the term “machine” in SVM? A machine is a mechanical or electrical physical device. Wouldn’t “support vector classifier” make more sense? Or maybe “support vector machine learning”? It’s possible the term “support vector machine” was influenced by the seminal 1958 research paper “A Learning Machine: Part I” by R. Friedberg.

I put together a scikit SVM demo. I used one of my standard datasets for a multi-class example. The data looks like:

 1   0.24   1   0   0   0.2950   2
-1   0.39   0   0   1   0.5120   1
 1   0.63   0   1   0   0.7580   0
-1   0.36   1   0   0   0.4450   1
. . . 

Each line of data represents a person. The fields are sex (male = -1, female = 1), age (normalized by dividing by 100), state (Michigan = 100, Nebraska = 010, Oklahoma = 001), annual income (divided by 100,000), and politics type (0 = conservative, 1 = moderate, 2 = liberal). The goal is to predict politics type from sex, age, state, income. There are 200 training items and 40 test items.

The SVC constructor signature is crazy:

  # SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale',
  #  coef0=0.0, shrinking=True, probability=False, tol=0.001,
  #  cache_size=200, class_weight=None, verbose=False,
  #  max_iter=-1, decision_function_shape='ovr',
  #  break_ties=False, random_state=None)

For my demo, I used these parameters:

  params = { 
    'C' : 1.0,
    'kernel' : 'rbf',
    'gamma' : 1.0/6,
    'shrinking' : True,
    'probability' : True,
    'tol' : 1.0e-3,
    'cache_size' : 200.0,
    'max_iter' : -1,
    'decision_function_shape' : 'ovr'
  }

The results were terrible, as is often the case with a first stab using an SVM. I could have spent several hours fiddling with the parameters, but I wasn’t in the mood.

The moral of the story is that even technical fields like machine learning are subject to fads and fashion. SVMs were wildly over-hyped for several years until their weaknesses were acknowledged.



Movie actresses can influence women’s fashion. Here are three winners of the Academy Award for Best Actress. To my eye, their clothing style is subdued and elegant. Left: Janet Gaynor (1928), the first winner. Center: Audrey Hepburn (1953). Right: Faye Dunaway (1976).


Demo code below. The data can be found at https://jamesmccaffrey.wordpress.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.

# people_politics_svm_sckit.py

# predict politics (0 = con, 1 = mod, 2 = lib) 
# from sex, age, state, income

# sex  age    state    income   politics
# -1   0.27   0  1  0   0.7610   2
#  1   0.19   0  0  1   0.6550   0
# sex: 0 = male, 1 = female
# state: michigan = 100, nebraska = 010, oklahoma = 001
# politics: conservative, moderate, liberal

# Anaconda3-2022.10  Python 3.9.13  scikit 1.0.2
# Windows 10/11

import numpy as np 
from sklearn.svm import SVC
import warnings
warnings.filterwarnings('ignore')  # early-stop warnings

# ---------------------------------------------------------

def show_confusion(cm):
  dim = len(cm)
  mx = np.max(cm)             # largest count in cm
  wid = len(str(mx)) + 1      # width to print
  fmt = "%" + str(wid) + "d"  # like "%3d"
  for i in range(dim):
    print("actual   ", end="")
    print("%3d:" % i, end="")
    for j in range(dim):
      print(fmt % cm[i][j], end="")
    print("")
  print("------------")
  print("predicted    ", end="")
  for j in range(dim):
    print(fmt % j, end="")
  print("")

# ---------------------------------------------------------

def main():
  # 0. get ready
  print("\nBegin scikit support vector machine example ")
  print("Predict politics from sex, age, State, income ")
  np.random.seed(1)
  np.set_printoptions(precision=4, suppress=True)

  # sex  age    state    income   politics
  # -1   0.27   0  1  0   0.7610   2
  #  1   0.19   0  0  1   0.6550   0

  # 1. load data
  print("\nLoading data into memory ")
  train_file = ".\\Data\\people_train.txt"
  train_xy = np.loadtxt(train_file, usecols=range(0,7),
    delimiter="\t", comments="#",  dtype=np.float32) 
  train_x = train_xy[:,0:6]
  train_y = train_xy[:,6].astype(int)

  test_file = ".\\Data\\people_test.txt"
  test_xy = np.loadtxt(test_file, usecols=range(0,7),
    delimiter="\t", comments="#",  dtype=np.float32) 
  test_x = test_xy[:,0:6]
  test_y = test_xy[:,6].astype(int)

  print("\nTraining data:")
  print(train_x[0:4])
  print(". . . \n")
  print(train_y[0:4])
  print(". . . ")
 
# ---------------------------------------------------------

  # 2. create network 
  # SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale',
  #  coef0=0.0, shrinking=True, probability=False, tol=0.001,
  #  cache_size=200, class_weight=None, verbose=False,
  #  max_iter=-1, decision_function_shape='ovr',
  #  break_ties=False, random_state=None)

  params = { 
    'C' : 1.0,
    'kernel' : 'rbf',
    'gamma' : 1.0/6,
    'shrinking' : True,
    'probability' : True,
    'tol' : 1.0e-3,
    'cache_size' : 200.0,
    'max_iter' : -1,
    'decision_function_shape' : 'ovr'
  }
       
  print("\nCreating radial basis function SVM-SVC model ")
  model = SVC(**params)

# ---------------------------------------------------------

  # 3. train
  print("\nTraining parameters: \n")
  print(params)
  print("\nStarting training ")
  model.fit(train_x, train_y)
  print("Done ")

  # 4. evaluate
  acc_train = model.score(train_x, train_y)
  print("\nAccuracy on train = %0.4f " % acc_train)
  acc_test = model.score(test_x, test_y)
  print("Accuracy on test = %0.4f " % acc_test)

  from sklearn.metrics import confusion_matrix
  y_predicteds = model.predict(test_x)
  cm = confusion_matrix(test_y, y_predicteds) 
  # print("\nConfusion matrix raw: ")
  # print(cm)
  print("\nConfusion: ")
  show_confusion(cm)  # with formatted labels

  # 5. use model
  print("\nPredict for: M 35 Nebraska $55K ")
  X = np.array([[-1, 0.35, 0,1,0, 0.5500]],
    dtype=np.float32)

  probs = model.predict_proba(X)
  print("\nPrediction pseudo-probs: ")
  print(probs)

  politic = model.predict(X)  # 0,1,2
  lbls = ["conservative", "moderate", "liberal"]
  print("\nPredicted class: ")
  print(lbls[politic[0]])

  # 6. TODO: save model using pickle
  # import pickle
  # print("Saving trained model ")
  # path = ".\\Models\\svm_model.sav"
  # pickle.dump(model, open(path, "wb"))

  # load and use saved model
  # X = np.array([[-1, 0.35, 0,1,0, 0.5500]],
  #   dtype=np.float32)
  # with open(path, 'rb') as f:
  #   loaded_model = pickle.load(f)
  # pa = loaded_model.predict_proba(X)
  # print(pa)

  print("\nEnd scikit SVM demo ")

if __name__ == "__main__":
  main()
This entry was posted in Machine Learning, Scikit. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s