Why PyTorch Layer Weight Matrix Shape Seems Backward

A PyTorch weight matrix has shape [num_out, num_in] rather than the more logical [num_in, num_out]. This seems a bit strange. Furthermore, when computing a set of output nodes, the weight matrix must be transposed before applying matrix multiplication. This seems very inefficient, especially because output nodes are computed many, many (often millions) times during training.

Surprisingly, the PyTorch apparently backward weight matrix shape is better because 1.) behind the scenes the matrix transpose operation is “free” (there’s no actual transposition involved), and 2.) behind the scenes the backward pass to compute gradients is usually (but not always) faster with a [num_out, num_in] shape than with a [num_in, num_out] shape. See the discussion at discuss.pytorch.org/t/why-does-the-linear-module-seems-to-do-unnecessary-transposing/6277/7.

Note: The Keras neural network library stores weight matrices in [num_in, num_out] shape.

Here’s a concrete example of a 4-7-3 neural network for the Iris dataset. Iris data has four inputs (sepal length, width, petal length, width), and three outputs (“setosa”, “versicolor”, “virginica”).

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()  # Python 3.2 and earlier
    self.hid1 = T.nn.Linear(4, 7)  # 4-7-3
    self.oupt = T.nn.Linear(7, 3)
    
    lo = -0.10; hi = +0.10
    T.nn.init.uniform_(self.hid1.weight, lo, hi)
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.uniform_(self.oupt.weight, lo, hi)
    T.nn.init.zeros_(self.oupt.bias)
    
  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.log_softmax(self.oupt(z), dim=1)  # NLLLoss() 
    return z

The hid1 layer weight matrix has shape [7,4] and the oupt layer weight matrix has shape [3,7].

One scenario where the shape of weight matrices is relevant is when writing custom weight initialization code.



Many of the movie posters for early James Bond films featured the backs of women. I have no idea why this was done or what it means. Left: “Dr. No” (1962), the first movie in the series. Center: Thunderball” (1965), the fourth movie in the series. Right: “You Only Live Twice” (1967), the fifth movie in the series.


This entry was posted in PyTorch. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s