Loading a Text File into a Python Matrix

My Python programming had gotten a bit rusty lately so I’ve been brushing up by doing short demo programs. In machine learning, a common task is to load a text file containing numbers into a matrix. So I wrote a demo that does just that, using some data from the well-known Iris Dataset.

I had a text file named testData.txt that looks like this:

5.0,3.5,1.3,0.3,1,0,0
4.5,2.3,1.3,0.3,1,0,0
. . .
5.9,3.0,5.1,1.8,0,0,1

There are 30 rows. The key function is defined:

def loadFile(df):
  # load a comma-delimited text file into an np matrix
  resultList = []
  f = open(df, 'r')
  for line in f:
    line = line.rstrip('\n')  # "1.0,2.0,3.0"
    sVals = line.split(',')   # ["1.0", "2.0, "3.0"]
    fVals = list(map(np.float32, sVals))  # [1.0, 2.0, 3.0]
    resultList.append(fVals)  # [[1.0, 2.0, 3.0] , [4.0, 5.0, 6.0]]
  f.close()
  return np.asarray(resultList, dtype=np.float32)  # already float32
# end loadFile

Parameter df (“data file”) is the path to th file. I walk through each line and a.) strip away the trailing newline using rstrip(), then b.) separate the comma-delimited values into a list of strings using split(), c.) convert the list of strings to a list of float32 using map(), and then d.) append the current row-list to the overall result list. Because each row-list is already my desired type float32 I didn’t have to specify the dtype in asarray() but I did so for clarity.

After all the rows have been processed, the list-of-lists is converted to a NumPy 30×7 matrix using the asarray() function, which is returned.

There are more efficient ways to load a text file into a matrix, but this technique is fine for simple scenarios.

Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.