Generating Non Linearly Separable Test Data

This morning I was working on a kernel logistic regression (KLR) problem. Regular logistic regression (LR) is perhaps the simplest form of machine learning (ML). It’s used when the problem is to predict a binary value, using two or more numeric values. For example, you might want to predict if a person is Male (0) or Female (1), based on height, weight, and annual income.

The problem with regular LR is that it only works with data that is linearly separable — if you graph the data, you must be able to draw a straight line that more or less separate the two classes you’re trying to predict.

Kernel logistic regression can handle non linearly separable data. For example, the graph below might represent the predict-the-sex problem where there are just two input values, say, height and weight.

Well, anyway, in order to test my kernel logistic regression ML code, I needed some non linearly separable data. I was about to start writing some C# code when quite by accident I came across a Python function named make_circles() that made the data shown in the graph above.

The code is simple:

# makeCircles.py

import numpy as np
from sklearn.datasets import make_circles

np.random.seed(4)
numPoints = 20
X, y = make_circles(n_samples=numCircles,
  factor=.3, noise=.05)
X = 10 * X
for i in range(0, numPoints):
  print(str(X[i][0]) + ", " +
        str(X[i][1]) + ", " +
        str(y[i]))

I ran the script as

(prompt) python makeCircles.py > nonSepData.txt

Then I opened the comma-delimited file in Excel, sorted the data on the 0-or-1 column, and made a graph. By adjusting the print() function I can control the exact form of the output.

The downside of this technique is that it can only generate data with two dimensions.

Posted in Machine Learning | Leave a comment

Brain Damage, Idiot Savants, and Neural Networks

It’s been well-known for a long time that every now and then, someone who suffers a serious brain injury to the left hemisphere can lose much cognitive ability but gain superhuman powers in some areas. For example, a person can gain incredible artistic or musical abilities, but not be able to function in society.

There’s speculation that brain damage forces sensory inputs to take alternate, new pathways though the brain, exposing near-miraculous abilities that are latent within everyone. Put another way, we all have incredible talents that never come to the surface because of the way in which our brains are wired.

Some researchers actually experimented by applying a powerful magnetic field to the brains of normal people. In some cases, the people gained significant improvement in things like verbal memory and drawing! The changes were only temporary. I’m not quite sure I’d want a researcher applying a powerful magnetic field to my brain (or any other part of my body for that matter).

I think this phenomenon is related to computer science neural network dropout. In NN dropout, randomly-selected processing nodes are turned on and off, and in many situations the NN performs much better than the same NN without dropout. Again, the speculation is that dropout forces the NN to find alternate pathways.

Weird. We’re at the very beginning of a new era of artificial intelligence. Currently, NNs can be constructed with perhaps tens of thousands of artificial neurons. When computer hardware expands to be able to handle millions or billions of artificial neurons, AI has unlimited potential — maybe even the ability to surpass humans.

Posted in Machine Learning | Leave a comment

The 2017 Interop ITX Conference is Coming Up Soon

I get quite a few invitations to speak at conferences, but I usually say “yes” only to the events where I think I can do good for my company and also learn something useful myself. One of these events is the 2017 Interop ITX Conference, May 15-19 in Las Vegas. See http://www.interop.com.

Interop is an interesting event. It’s all about, well, like many things related to technology, Interop has changed a lot over the years. At last year’s Interop I saw talks and exhibits related to network hardware infrastructure, software applications, security, and all kinds of things in between. I expect the 2017 Interop to have about 3,000 attendees.

Interop is run by UBM, a company that also does Black Hat and publishes InformationWeek. This year’s Interop has been rebranded as Interop ITX. I’m not sure what the “ITX” stands for – maybe “IT everything” because years ago Interop was just about network infrastructure.

I’m going to give a short talk on deep neural networks. I’ll explain exactly what they are in simple terms, and describe how they’ve been responsible for recent breakthroughs such as speech (Apple’s Siri and Microsoft’s Cortana, and self-driving automobiles. I’ll also speculate about the future of Artificial Intelligence.

One of the big challenges facing technical people like me is creativity. My colleagues and I know all sorts of fancy machine learning algorithms and techniques. But these algorithms are only useful if they’re applied to a real-world problem, and that requires a flash of inspiration and creativity.

Everyone knows that when you’re in day-to-day mode, you hardly have time to be creative because you’re too busy sending email messages, and putting out the work-fires-of-the-moment. Going to a conference such as Interop gets you off the virtual work treadmill, and out of your too-familiar environment, and can put you in a situation where creativity and inspiration have chance to happen.

Posted in Conferences | Leave a comment

Support Vector Machine Classification and Kernels

Even among my engineering colleagues who work with machine learning quite often, the basic ideas behind support vector machine (SVM) classification are a bit hazy. I believe this mild confusion is due, in part, to the fact that SVM classification has a large number of ideas. Each idea is simple by itself, but the sheer number of ideas leads to some confusion.

If you’ve ever tried to read an article that explains SVMs, you’re likely overwhelmed because to truly present SVMs, an author has to explain pages and pages of preliminaries.

This blog post is my effort to explain SVM classification in as few sentences as possible.

One possible form of the prediction equations for SVM classification is:

Here xi is the (vector of values) to predict. The xj are so-called support vectors which are a subset of the training data. The yj is the class (-1 or +1) of each data xj. The aj are constants, one for each xj. The b is a single numeric constant. Letter s is the number of support vectors. The K is a kernel function that accepts two vectors and returns a single number that is a measure of similarity between the two vectors, where 1.0 means identical and 0.0 means as different as possible.

Suppose you have 100 training items. Using part of the SVM algorithm, you determine that only s = 2 of the training items are the special support vectors:

(1) 5.0  3.0  2.0  +1
(2) 4.0  6.0  1.0  -1

And using the SVM algorithm, you somehow determine that the two aj values are (0.8, 0.7). And you determine that b = 0.75.

Suppose the kernel function used (there are dozens of such functions) for all this is the radial basis function (RBF), which is the second equation above.

The notation in the numerator means squared Euclidean distance. The gamma in the denominator is a constant — suppose it’s 1.0.

So if you want to predict the class (+1 or -1) of a new item that has predictor values (8.0, 5.0, 4.0), the calculations are:

j = 1:
(0.8)(+1) * K([5,3,2], [8,5,4]) + 0.75 =
(0.8) * 8.5 + 0.75 =
7.55

j = 2:
(0.7)(-1) * K([4,6,1], [8,5,4]) + 0.75 =
(-0.7) * 13.0 + 0.75 =
-8.35

Then 7.55 + -8.35 = -0.80 which is negative, so the predicted class is -1.

So, SVM classification involves first choosing a kernel function and any parameters that function has (such as gamma for RBF), then using a complex algorithm, usually SMO (“sequential minimal optimization”) to determine which training items are the special support vectors, and the aj values for each support vector, and the b constant.

Posted in Machine Learning | Leave a comment

Support Vector Machine Classification

Take a look at the graph below. Each of the nine data points belongs to one of three classes, red = 1, blue = 2, green = 3. The goal of a machine learning classifier is to create a prediction equation. For example, if we want to predict the class of a new point (4,5), we’d expect the classifier to respond with 2 (blue).

There are dozens of machine learning classification algorithms. For example, for this demo problem you could use multi-class logistic regression, or neural network classification, or SVM (support vector machine) classification.

The SVM algorithm is one of the most complex in machine learning and writing SVM from scratch isn’t practical so you have to use a tool. I coded up an example using the “svm” module in the “sklearn” Python language library:

# svm_demo.py

from sklearn import svm
import numpy as np

def get_points():
  return np.array([[2,3], [3,2], [4,3],
                   [3,6], [4,7], [5,6],
                   [6,4], [7,3], [7,5]])

def get_labels():
  return np.array([1,1,1, 2,2,2, 3,3,3])

# --------------------------------------

print("\nBegin SVM using sklearn demo \n")

print("Loading test data")
points = get_points()
labels = get_labels()

print("Creating SVM classifier \n")
classifier = svm.SVC(kernel='rbf', gamma=1.0, C=10.0)
classifier.fit(points, labels)

unknown = np.array([[4,5]])
print("Making prediction for: ")
print(unknown)
pred_class = classifier.predict(unknown)

print("\nPredicted class is: ")
print(pred_class)

print("\nEnd demo \n")

The SVC object (“support vector classifier”) requires a minimum of two parameters. The first is a kernel function, which typically has one or more of its own parameters. I used “rbf” (radial basis function) with gamma = 1.0. The second parameter, C, controls how the SVC classifier deals with outlier data points, and I set its value to 10.0.

I don’t use SVM classification very often. The values for the kernel function and its parameter(s), and the C constant, must be determined by trial and error. In general I prefer using neural network classification.

Posted in Machine Learning | Leave a comment

Using the CNTK Built-In File Reader Functions

Microsoft CNTK is a very powerful code library for machine learning. The library is written in C++ but has a Python API for convenience.

I’ve been taking a very deep dive into CNTK v2.0 Release Candidate 1. The v2.0 should be released to the public sometime in the next few months.

Yesterday I spent quite a bit of time with experiments to understand the built-in file reader functions. For example, suppose you have a data file like so:

5.0,3.5,1.3,0.3,1,0,0
4.5,2.3,1.3,0.3,1,0,0
5.5,2.6,4.4,1.2,0,1,0
6.1,3.0,4.6,1.4,0,1,0
6.2,3.4,5.4,2.3,0,0,1
5.9,3.0,5.1,1.8,0,0,1

This is part of the famous Iris Dataset. The first four numbers in each row are the predictor variables (sepal length, width, petal length, width). The next three numbers represent the species — (1,0,0) = “setosa”, (0,1,0) = “versicolor”, (0,0,1) = “virginica”.

In order to use this data with CNTK you’d have to write a custom Python function that parses the data file into two matrices, one for the predictor values, one for the label values. Not too difficult, but quite time-consuming.

An alternative is to create a file that uses a special CNTK format, and then use built-in CNTK reader functions. The data above, in CNTK format, is:

|features 5.0 3.5 1.3 0.3 |labels 1 0 0
|features 4.5 2.3 1.3 0.3 |labels 1 0 0
|features 5.5 2.6 4.4 1.2 |labels 0 1 0
|features 6.1 3.0 4.6 1.4 |labels 0 1 0
|features 6.2 3.4 5.4 2.3 |labels 0 0 1
|features 5.9 3.0 5.1 1.8 |labels 0 0 1

Here the words “features” and “labels” aren’t special so I could have used “predictors” and “species” for example.

Reading this data file would start with:

# reader_demo.py
# demo the CNTK built-in reader

import cntk as C
import numpy as np
from cntk.io import CTFDeserializer, MinibatchSource, StreamDef,
  StreamDefs
from cntk.io import INFINITELY_REPEAT

def create_reader(path, is_training, input_dim, output_dim):
  return MinibatchSource(CTFDeserializer(path, StreamDefs(
    labels = StreamDef(field='labels', shape=output_dim,
      is_sparse=False),
    features = StreamDef(field='features', shape=input_dim,
      is_sparse=False)
  )), randomize = is_training,
    max_sweeps = INFINITELY_REPEAT if is_training else 1)

The program-defined create_reader function looks a bit messy but is essentially boilerplate. The calling code could be:

print("\nEnd CNTK reader demo \n")

input_dim = 4
output_dim = 3

input_Var = C.input(input_dim, np.float32) 
label_Var = C.input(output_dim, np.float32)

theFile = "dummyData_cntk.txt"
batch_size = 2
my_reader = create_reader(theFile, True,
  input_dim, output_dim)
my_input_map = {
  label_Var  : my_reader.streams.labels,
  input_Var  : my_reader.streams.features
}

for i in range(0, 5):
  print("Reading batch " + str(i))
  currBatch =
    my_reader.next_minibatch(batch_size,
    input_map = my_input_map)

print("\nEnd CNTK reader demo \n")

The input_Var and label_Var objects are kind of mysterious and an explanation is outside the scope of this post. The my_reader object fetches chunks of the file at a time and returns a batch of feature and label data that can be passed to a training CNTK function.

Moral: Dealing with data is always rather annoying. Because CNTK is a low-level library, you can write Python code to parse a data file in whatever format you have, or you can create a special CNTK-format version of your data and then use the built-in reader functions.

Posted in Machine Learning | Leave a comment

A First Look at the CNTK v2.0 Release Candidate Machine Learning Library

Microsoft CNTK (Microsoft Cognitive Toolkit) is a powerful code library that can be used for many machine learning tasks. A few days ago, CNTK v2.0 Release Candidate 1 became available.

Version 2 is a huge change from version 1 — the versions are so different from a developer’s perspective, that I consider CNTK v2 to be an entirely different library. CNTK v2 is written in C++ but has a Python API interface because nobody wants to torture themselves by writing C++ code unless necessary.

So, I rolled up my developer’s sleeves and dove in. Because I had an old CNTK v2 Beta, I first removed it by 1.) Using the Control Panel to uninstall the Anaconda Python distribution, 2.) Deleting all references to CNTK and repos from my System Environment Variables, and 3.) Deleting the old install directory (C:\local).

With a clean system, I first installed the required Anaconda version 4.1.1 64-bit with Python 3. After verifying Python 3.5 was installed, I used pip to install CNTK by opening a command shell and typing (the URL is really long so I put a space after each slash for readability):

pip install https://cntk.ai/ PythonWheel/ CPU-Only/ 
 cntk-2.0rc1-cp35-cp35m-win_amd64.whl

And the installation just worked. Nice! I verified CNTK was alive by typing:

python -c "import cntk; print(cntk.__version__)"

and CNTK responded by displaying his 2.0rc1 version.

Next I took a CNTK script that I’d written for CNTK v2 Beta and tried to run it. I immediately got lots errors but they weren’t too hard to fix — mostly package name changes.

My script creates a simple, single-hidden-layer neural network and creates a model that can make predictions of the famous Iris Dataset.

CNTK is a very powerful library and has a double, steep learning curve. Because it works at a relatively low level, you must have a good grasp of things like neural network architecture and concepts such as back-propagation. And you must have intermediate or better Python skill. And then learning the library itself is quite difficult.

But the payoff is a very powerful, very fast machine learning library.

Posted in Machine Learning | 1 Comment