I’ve been looking at the relatively new CNTK (Computational Network Tool Kit). It’s a high-performance, command-line, machine learning tool.
The most basic form of machine learning is binary classification using logistic regression. For example, you might want to predict whether a person will vote for political candidate A or political candidate B, based on the voter’s age and annual income.
In the image below, the first training input data is (3.854499, 4.163941) and a known output of 1.0. The two possible outputs are encoded as 0.0 or 1.0. The training file (not shown) has 1000 items and is used to create a magic prediction equation. There are 80 test items which are used to evaluate the accuracy of the prediction model.
I created a CNTK configuration file with a .cntk extension that specifies how to make predictions (binary logistic regression as opposed to a neural network or something else) and where the data files are.
I ran the example from the command line:
C:\> cntk configfile=LogisticRegressionDemo.cntk
The magic prediction equation for logistic regression is p = 1 ( 1 + e ^ -z) where z = b + (w0)(x0) + (w1)(x1). The output weights file tells me that b = -12.3975649, w0 = 2.40208318, w1 = 2.66412544.
The .p file gives the p values for each pair of input values. They are 0.999649, 0.001298, and so on. If a p values is less than 0.5 the prediction is whatever is encoded as the 0.0 value. If a p value is greater than 0.5 the prediction is whatever is encoded as the 1.0 value.
I verified the p values from the .p file by using Excel with the b, w0, w1 values for the first few data points. You can see the values in columns K and M are identical.
In the end, the important thing is the accuracy of the magic equation, meaning, what percent of the test data does it predict accurately. I could not find an accuracy result in the CNTK output files, but I’m guessing I just didn’t know where to look.
CNTK: very, very, cool.