Loading a Text File into a Python Matrix

My Python programming had gotten a bit rusty lately so I’ve been brushing up by doing short demo programs. In machine learning, a common task is to load a text file containing numbers into a matrix. So I wrote a demo that does just that, using some data from the well-known Iris Dataset.

I had a text file named testData.txt that looks like this:

. . .

There are 30 rows. The key function is defined:

def loadFile(df):
  # load a comma-delimited text file into an np matrix
  resultList = []
  f = open(df, 'r')
  for line in f:
    line = line.rstrip('\n')  # "1.0,2.0,3.0"
    sVals = line.split(',')   # ["1.0", "2.0, "3.0"]
    fVals = list(map(np.float32, sVals))  # [1.0, 2.0, 3.0]
    resultList.append(fVals)  # [[1.0, 2.0, 3.0] , [4.0, 5.0, 6.0]]
  return np.asarray(resultList, dtype=np.float32)  # already float32
# end loadFile

Parameter df (“data file”) is the path to th file. I walk through each line and a.) strip away the trailing newline using rstrip(), then b.) separate the comma-delimited values into a list of strings using split(), c.) convert the list of strings to a list of float32 using map(), and then d.) append the current row-list to the overall result list. Because each row-list is already my desired type float32 I didn’t have to specify the dtype in asarray() but I did so for clarity.

After all the rows have been processed, the list-of-lists is converted to a NumPy 30×7 matrix using the asarray() function, which is returned.

There are more efficient ways to load a text file into a matrix, but this technique is fine for simple scenarios. In particular, you can use the loadtxt() function in the NumPy library.

This entry was posted in CNTK, Machine Learning. Bookmark the permalink.