My Python programming had gotten a bit rusty lately so I’ve been brushing up by doing short demo programs. In machine learning, a common task is to load a text file containing numbers into a matrix. So I wrote a demo that does just that, using some data from the well-known Iris Dataset.
I had a text file named testData.txt that looks like this:
5.0,3.5,1.3,0.3,1,0,0 4.5,2.3,1.3,0.3,1,0,0 . . . 5.9,3.0,5.1,1.8,0,0,1
There are 30 rows. The key function is defined:
def loadFile(df): # load a comma-delimited text file into an np matrix resultList =  f = open(df, 'r') for line in f: line = line.rstrip('\n') # "1.0,2.0,3.0" sVals = line.split(',') # ["1.0", "2.0, "3.0"] fVals = list(map(np.float32, sVals)) # [1.0, 2.0, 3.0] resultList.append(fVals) # [[1.0, 2.0, 3.0] , [4.0, 5.0, 6.0]] f.close() return np.asarray(resultList, dtype=np.float32) # already float32 # end loadFile
Parameter df (“data file”) is the path to th file. I walk through each line and a.) strip away the trailing newline using rstrip(), then b.) separate the comma-delimited values into a list of strings using split(), c.) convert the list of strings to a list of float32 using map(), and then d.) append the current row-list to the overall result list. Because each row-list is already my desired type float32 I didn’t have to specify the dtype in asarray() but I did so for clarity.
After all the rows have been processed, the list-of-lists is converted to a NumPy 30×7 matrix using the asarray() function, which is returned.
There are more efficient ways to load a text file into a matrix, but this technique is fine for simple scenarios.