I’ve been using Python quite a bit recently, mostly because I’ve been looking at the TensorFlow and CNTK machine learning libraries which both have a Python interface.
Some tools, such as Weka, use a required data file format (ARFF). But TensorFlow and CNTK operate at a lower level. Reading raw data into a suitable data structure is not exciting but it’s a key part of using TensorFlow or CNTK.
It’s possible to use the built-in “reader” functions, but sooner or later I know I’ll need to create a custom reader, so I figured I’d refresh my Python knowledge by reading a text file that simulates the Iris data set, into two Python numeric lists.
I created a dummy text file:
0.1,0.2,0.3,0.4,1,0,0 0.5,0.6,0.7,0.8,0,1,0 0.9,1.0,1.1,1.2,0,0,1
The first four items in each line are the “features” (predictor variables) and the last three items are the “labels”. Then after a somewhat surprisingly long time (my Python was quite rusty) I wrote a demo script that read the file into a list of the features and a list of the labels.
# foo.py # parse a text file numeric values to two lists ftrs =  lbls =  f = open('C:\\Data\\CNTK_Scripts\\iris.txt', 'r') for line in f: ff =  ll =  line = line.rstrip('\n') xx = line.split(',') for i in range(0,4): ff.append(float(xx[i])) ftrs.append(ff) for i in range(4,7): ll.append(float(xx[i])) lbls.append(ll) f.close() print("\nBegin demo \n") print(ftrs) print("") print(lbls) print("\nEnd script \n")
I don’t think there’s a bottom line to this blog post, except maybe that using Python, like all programming languages, requires practice.