CNTK is Microsoft’s open source library for deep neural networks. A key component in CNTK code is a mini-batch object. A mini-batch object holds training data (input values and known correct output values) and a bunch of them are sent to a CNTK training function.
I decided to see if I could iterate though a data file using CNTK functions. I didn’t have a concrete idea of why this might be useful, but I do have a few thoughts that possibly CNTK could be used for numeric processing in general, in addition to creating deep neural networks.
Anyway, after some experimentation, I succeeded. I created a small dummy text file in CNTK format:
|id 001 |data 11 |id 002 |data 12 |id 003 |data 13 |id 004 |data 14 |id 005 |data 15 |id 006 |data 16 |id 007 |data 17 |id 008 |data 18 |id 009 |data 19
The I wrote a demo program that uses CNTK stream functions to read four items at a time into a mini-batch, and then walk through each of the four items in the mini-batch.
The good news is that I can now iterate though a CNTK file using CNTK stream functions. The bad news (for now at least) is that data in a mini-batch isn’t particularly useful if it isn’t going to be sent to a training function. In my demo, I cast each item to an array using the asarray() function. But I could have just read data directly without using CNTK at all, with the numpy loadtxt() function.
Hmmmm. I’m not entirely convinced that I fully understand the underlying mechanism here so I’ll keep probing. I still think there might be some clever, out-of-the-box ways to use the CNTK library.