Working with Multi-Dimensional Arrays is Difficult

I often teach introductory machine learning to software engineers who have a lot of programming experience but have little experience with ML. One of the biggest challenges for engineers who are new to ML is something nobody ever talks about — working with multi-dimensional arrays.

The NumPy library has dozens of ndarray (“n-dimensional array”) functions. I was working on a problem where I needed to concatenate/merge two 2D arrays. The np.concatenate() function is overkill in the sense that it has many optional parameters.

Sometimes I prefer to implement helper functions from scratch, probably because I worked for many years in C/C++/C# environments where implementing from scratch was the standard approach for multi-dimensional array functions. Unfortunately, when working with Python, a from-scratch implementation of an array function is usually much slower than a library implementation because Python from-scratch looping is slow.

The moral of the story is that implementing ML systems has many layers — it’s difficult just to get a system to work, but there are also many subtle issues such as implementing-from-scratch vs. using library functions.

I was in Manhattan, New York City, for several days not too long ago. I spent one full day walking around from the Brooklyn Bridge up to Central Park. I was struck by the way buildings are concatenated together, producing incredible urban density. People who live in such conditions should read about the Rat Utopia experiment by John Calhoun.

Demo code:


import numpy as np

def my_concat_rows(a, b):
  (a_rows, a_cols) = a.shape  # mxn
  (b_rows, b_cols) = b.shape  # rxn
  result = np.zeros((a_rows + b_rows, a_cols))
  for i in range(a_rows):
    for j in range(a_cols):
      result[i][j] = a[i][j]  # or result[i,j] = a[i,j]
  for i in range(b_rows):  # 0, 1, ..
    for j in range(a_cols):
      result[i+a_rows][j] = b[i][j]    
  return result

a = np.array([[1.1, 1.2, 1.3],
              [2.1, 2.2, 2.3],
              [3.1, 3.2, 3.3],
              [4.1, 4.2, 4.3]])  # 4x3

b = np.array([[5.1, 5.2, 5.3],
              [6.1, 6.2, 6.3]])  # 2x3

print("\na = ")
print("\nb = ")

c = np.concatenate((a,b), axis=0) # by rows
print("\nc = ")

d = my_concat_rows(a, b)
print("\nd = ")
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s