Peeking into PyTorch Source Code

One of the things I like about PyTorch is that it’s possible to go into the source code and easily modify it. A few days ago I ran into an issue where I didn’t understand what was happening in a PyTorch program. I was able to figure out the issue by going into the PyTorch init.py file in the nn module, and inserting a few print() statements.

I had written a PyTorch autoencoder that used a straight-through 9-4-2-4-9 architecture. No problem. Then I refactored the autoencoder to use an explicit 9-4-2 encoder plus a 2-4-9 decoder. No problem, except that the two different approaches gave slightly different answers, which was puzzling to me.

I determined the the two approaches were initializing the network weights differently even though I was using explicit xavier_uniform_() intialization. But then I got stuck.

I created two slimmed-down versions, a 5-2-5 straight-through autoencoder version, and a 5-2 encoder plus 2-5 decoder version. Then I located the PyTorch code on my machine which was at C:\(user)\Anaconda3\Lib\site-packages\torch. Then I went into the nn directory and opened file init.py using Notepad. I inserted print(“inside uniform_”) and print(“inside xavier_uniform_”) statements in the corresponding functions.

Briefly, when you create a Linear layer, PyTorch will automatically initialize the layer’s weights using a default mechanism which calls into the PyTorch uniform_() function. Then if you explicitly set the layer’s weight’s using xavier_uniform_(), that Xavier initialization calls into uniform_() again.

The straight-through architecture looks like:

create Linear1
create Linear2
xavier on Linear1
xavier on Linear2

which generates calls to:

uniform_()
uniform_()
xavier_()
uniform_()
xavier_()
uniform_()

The explicit encoder-decoder architecture looks like:

create Linear1
xavier on Linear1

create Linear2
xavier on Linear2

which generates calls to:

uniform_()
xavier_()
uniform_()
uniform_()
xavier_()
uniform_()

So the different architectures generate calls to uniform_() in a slightly different order, which leads to different weights.

The moral of the story is that when needed, you can go directly into PyTorch code and modify it in order to understand it, or to change its behavior.

Left: X-Ray Specs. They were first patented in 1906 by George W. Macdonald. An improved version was patented in 1969 by Harold von Braunhut, who also invented Amazing Sea-Monkeys.

Center: I suspect these steampunk style glasses are mostly artistic rather than functional.

Right: Sherlock Holmes is often associated with a magnifying glass. The novel, “A Study in Scarlet” (1887) was the first appearance of Sherlock Holmes and Dr. Watson and the first story to incorporate the magnifying glass as an investigative tool. “As he spoke, he whipped a tape measure and a large round magnifying glass from his pocket. With these two implements he trotted noiselessly about the room. He examined with his glass the word upon the wall, going over every letter of it with the most minute exactness.”