Iterating Through a CNTK Data File

The CNTK library is a very powerful tool to do advanced machine learning. Today I ran into an unusual scenario. CNTK supports a data file format called CTF. For example:

|predictors 1.12 1.18 1.32 1.29 |passengers 1.21
|predictors 1.18 1.32 1.29 1.21 |passengers 1.35
|predictors 1.32 1.29 1.21 1.35 |passengers 1.48
|predictors 1.29 1.21 1.35 1.48 |passengers 1.48
|predictors 1.21 1.35 1.48 1.48 |passengers 1.36
|predictors 1.35 1.48 1.48 1.36 |passengers 1.19

This is a sample of a time series regression problem I was working on. There are four input (aka feature) predictors values followed by a single value to predict. The CTF format is very convenient when you want to train a neural network because there’s built-in support to read and access.

But after training I wanted to walk through the input data, one at a time. Surprisingly, there’s no easy way to do this. So one work-around solution is to use the numpy loadtxt() function.

# read_exp.py

import numpy as np
import cntk as C

the_file = "tsr_sample_cntk.txt"  # CNTK format

predictors = np.loadtxt(fname=the_file, dtype=np.float32,
 delimiter=" ", usecols=(1,2,3,4))
passengers = np.loadtxt(fname=the_file, dtype=np.float32,
 delimiter=" ", ndmin=2, usecols=[6]) # note!

input_dim = 4
hidden_dim = 12
output_dim = 1

input_Var = C.ops.input(input_dim, np.float32)
label_Var = C.ops.input(output_dim, np.float32)
# create and train the nnet object

np.set_printoptions(precision=2)
print("\n---- Predictions: ")
for i in range(len(predictors)):
  ipt = predictors[i]
  print("Inputs: ", end='')
  print(ipt, end='')
  # pred_passengers = nnet.eval( {input_Var: ipt} )
  pred_passengers = 1.0 + 0.12* i  # dummy prediction
  print("   Predicted: %0.2f \
   Actual: %0.2f" % (pred_passengers, passengers[i]))
print("----")

print("\nEnd experiment \n")

The code is a bit trickier than it appears. Notice that when reading the passengers field, I had to use ndmin (minimum dimension) to get a matrix as wanted by CNTK, and the usecols (which columns to use) parameter needs an enumerable list when you read only one column.

Once I have the input data, I can feed it to the trained neural network and call the eval() function to get the output. In my demo I simulate the prediction/output.

The moral of the story is that CNTK is a complex library and a strong knowledge of Python is at least useful and perhaps necessary in some scenarios.

Advertisements
This entry was posted in CNTK, Machine Learning. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s