Iterating Through a CNTK-Format Data File

CNTK is Microsoft’s open source library for deep neural networks. A key component in CNTK code is a mini-batch object. A mini-batch object holds training data (input values and known correct output values) and a bunch of them are sent to a CNTK training function.

I decided to see if I could iterate though a data file using CNTK functions. I didn’t have a concrete idea of why this might be useful, but I do have a few thoughts that possibly CNTK could be used for numeric processing in general, in addition to creating deep neural networks.

Anyway, after some experimentation, I succeeded. I created a small dummy text file in CNTK format:

|id 001 |data 11
|id 002 |data 12
|id 003 |data 13
|id 004 |data 14
|id 005 |data 15
|id 006 |data 16
|id 007 |data 17
|id 008 |data 18
|id 009 |data 19

The I wrote a demo program that uses CNTK stream functions to read four items at a time into a mini-batch, and then walk through each of the four items in the mini-batch.

The good news is that I can now iterate though a CNTK file using CNTK stream functions. The bad news (for now at least) is that data in a mini-batch isn’t particularly useful if it isn’t going to be sent to a training function. In my demo, I cast each item to an array using the asarray() function. But I could have just read data directly without using CNTK at all, with the numpy loadtxt() function.

Hmmmm. I’m not entirely convinced that I fully understand the underlying mechanism here so I’ll keep probing. I still think there might be some clever, out-of-the-box ways to use the CNTK library.

This entry was posted in CNTK, Machine Learning. Bookmark the permalink.

2 Responses to Iterating Through a CNTK-Format Data File

  1. You can do it in C# with GetDenseData

    • Good point. I’m not very familiar with the CNTK C# API. I prefer C# to Python for most systems level programming, but among my research colleagues, Python is much more common so I feel obliged to use Python.

Comments are closed.