I’ve been exploring the cool new CNTK (Computational Network Tool Kit) program. It was designed to do deep neural networks but the tool can also do logistic regression classification.
I did a couple of short experiments that are best explained with an image. The goal is to classify an arbitrary thing coded as 0 or 1, using two numeric values. When CNTK runs, after it calculates a magic prediction model, it spits out the error term for the test data. In this case the “0.11971580 * 4” means the average error across the four test items was 0.1197.
But nowhere could I find an accuracy — how many predictions did the model get correct? So I wrote a Python script to calculate accuracy.
One of the outputs of CNTK is a .p file that gives predicted probabilities. If a probability is less than 0.5 the prediction is class 0. If p >= 0.5 the prediction is class 1.
My Python script reads the four test file items, and peels off the actual class labels (0 or 1), and stores into a list. Then it reads the four p-values and stores into a second list. Then it walks through the two lists and calculates number correct predictions (if label == 0 and p = 0.5, or if label = 1 and p greater-than-or-equal 0.5).
As a useful side effect, my Python was a bit rusty and I noticed my accuracy-script had almost every key syntax item, so the script will make an excellent Python quick-reminder for the next time I have to write a script after being away from Python for a while.