I don’t often use the scikit-learn library, so I thought I’d do a quick demo just to refresh my memory. The scikit-learn library is a collection of Python code modules that can do machine learning tasks.
I like Python, but the language has a lot of moving parts. For example, at a minimum you need base Python, plus the NumPy library for numeric code, plus the SciPy library for arrays and matrices, and so on. Managing all these components can be a real pain, so I usually use the Anaconda distribution which wraps all these libraries up.
Anaconda comes with the Spyder IDE for Python, which I don’t really like that much. But it’ usable.
I culled demo code from various sources on the Internet. The idea is to create a classification model for the famous Fisher Iris Dataset. My demo script begins:
from sklearn import datasets from sklearn import metrics from sklearn.svm import SVC # load the iris datasets dataset = datasets.load_iris() print(dataset.data) print(dataset.target)
Next I create the model and make predictions:
# fit a SVM model to the data model = SVC() model.fit(dataset.data, dataset.target) print(model) # make predictions expected = dataset.target predicted = model.predict(dataset.data) # summarize the fit of the model print(metrics.classification_report(expected, predicted)) print(metrics.confusion_matrix(expected, predicted))
The last part of the demo creates a Principal Component Analysis graph:
print(__doc__) import matplotlib.pyplot as plt from sklearn import datasets from sklearn.decomposition import PCA iris = datasets.load_iris() X = iris.data y = iris.target target_names = iris.target_names pca = PCA(n_components= 2) X_r = pca.fit(X).transform(X) plt.figure() colors = ['navy', 'turquoise', 'darkorange'] lw = 2 for color, i, target_name in zip(colors, [0, 1, 2], target_names): plt.scatter(X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=.8, lw=lw, label=target_name) plt.legend(loc='best', shadow=False, scatterpoints=1) plt.title('PCA of IRIS dataset') plt.show()
Compared to my usual programming language and environment, C# and Visual Studio, Python and Spyder are very primitive. But Python has a much better set of ML libraries.