Parsing a Text File of Numeric Values using Python

I’ve been using Python quite a bit recently, mostly because I’ve been looking at the TensorFlow and CNTK machine learning libraries which both have a Python interface.

Some tools, such as Weka, use a required data file format (ARFF). But TensorFlow and CNTK operate at a lower level. Reading raw data into a suitable data structure is not exciting but it’s a key part of using TensorFlow or CNTK.

It’s possible to use the built-in “reader” functions, but sooner or later I know I’ll need to create a custom reader, so I figured I’d refresh my Python knowledge by reading a text file that simulates the Iris data set, into two Python numeric lists.

I created a dummy text file:

0.1,0.2,0.3,0.4,1,0,0
0.5,0.6,0.7,0.8,0,1,0
0.9,1.0,1.1,1.2,0,0,1

The first four items in each line are the “features” (predictor variables) and the last three items are the “labels”. Then after a somewhat surprisingly long time (my Python was quite rusty) I wrote a demo script that read the file into a list of the features and a list of the labels.

# foo.py
# parse a text file numeric values to two lists

ftrs = [] 
lbls = []

f = open('C:\\Data\\CNTK_Scripts\\iris.txt', 'r')
for line in f:
  ff = []
  ll = []
  line = line.rstrip('\n')
  xx = line.split(',') 

  for i in range(0,4):
    ff.append(float(xx[i]))
  ftrs.append(ff)

  for i in range(4,7):
    ll.append(float(xx[i]))
  lbls.append(ll)

f.close()

print("\nBegin demo \n")
print(ftrs)
print("")
print(lbls)

print("\nEnd script \n")

I don’t think there’s a bottom line to this blog post, except maybe that using Python, like all programming languages, requires practice.

parsingtextfileofnumericvaluesusingpython

Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.