A Python – scikit – Machine Learning Demo

I don’t often use the scikit-learn library, so I thought I’d do a quick demo just to refresh my memory. The scikit-learn library is a collection of Python code modules that can do machine learning tasks.

I like Python, but the language has a lot of moving parts. For example, at a minimum you need base Python, plus the NumPy library for numeric code, plus the SciPy library for arrays and matrices, and so on. Managing all these components can be a real pain, so I usually use the Anaconda distribution which wraps all these libraries up.

irisdemospyderpythonscikit

Anaconda comes with the Spyder IDE for Python, which I don’t really like that much. But it’ usable.

I culled demo code from various sources on the Internet. The idea is to create a classification model for the famous Fisher Iris Dataset. My demo script begins:

from sklearn import datasets
from sklearn import metrics
from sklearn.svm import SVC

# load the iris datasets
dataset = datasets.load_iris()
print(dataset.data)
print(dataset.target)

Next I create the model and make predictions:

# fit a SVM model to the data
model = SVC()
model.fit(dataset.data, dataset.target)
print(model)

# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

The last part of the demo creates a Principal Component Analysis graph:

print(__doc__)

import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.decomposition import PCA

iris = datasets.load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names

pca = PCA(n_components= 2)
X_r = pca.fit(X).transform(X)

plt.figure()
colors = ['navy', 'turquoise', 'darkorange']
lw = 2
for color, i, target_name in zip(colors,
 [0, 1, 2], target_names):
    plt.scatter(X_r[y == i, 0],
    X_r[y == i, 1], color=color,
    alpha=.8, lw=lw,
    label=target_name)
plt.legend(loc='best', shadow=False,
  scatterpoints=1)
plt.title('PCA of IRIS dataset')

plt.show()

Compared to my usual programming language and environment, C# and Visual Studio, Python and Spyder are very primitive. But Python has a much better set of ML libraries.

Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.