A PyTorch MaxPool2D Worked Example

One of my job responsibilities is to teach engineers and data scientists how to use the PyTorch neural network code library. There are many examples of how the max pooling in a CNN works, but they tend to be too generic (not specific to PyTorch), or too specific (a very low-level explanation of the library functions).

Here’s an example that I use. The demo sets up an input of a simple 4×4 grayscale (1 channel) image with dummy pixel values 0 through 15. The demo sets up a MaxPool2D layer with a 2×2 kernel and stride = 1 and applies it to the 4×4 input.

The diagram shows how applying the max pooling layer results in a 3×3 array of numbers. Using max pooling has three benefits. First, it helps prevent model over-fitting by regularizing input. Second, it improves training speed by reducing the number of parameters to learn. Third, it provides basic translation invariance.

The demo leaves out a ton of optional details but the point of my demo is to explain how PyTorch max pooling works, not to dive into the details.

Other kinds of pooling. Left: A pickup truck in a pool. Right: A pool in a pickup truck.

Demo code:

# maxpool_demo.py
# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

import numpy as np
import torch as T
device = T.device('cpu')

print("\nBegin PyTorch max pooling demo ")

x = np.arange(16, dtype=np.float32)
x = x.reshape(1, 1, 4, 4)  # bs, channels, height, width
X = T.tensor(x, dtype=T.float32).to(device)
print("\nSource input: ")
print(X)

pool1 = T.nn.MaxPool2d(2, stride=1)
z1 = pool1(X)
print("\nMaxPool with kernel=2, stride=1: ")
print(z1)

pool2 = T.nn.MaxPool2d(2, stride=2)
z2 = pool2(X)
print("\nMaxPool with kernel=2, stride=2: ")
print(z2)

print("\nEnd max pooling demo ")

This entry was posted in PyTorch. Bookmark the permalink.

1 Response to A PyTorch MaxPool2D Worked Example

Thorsten Kleppe says:

March 30, 2022 at 5:28 am

Nice explanation of how max pooling works.

I am currently trying to build a more advanced version of my CNN implementations. The biggest problem I see is the extreme computational load which increases with additional pooling. The idea is that with a stride 2 in the convolution step, the pooling step can be compensated more cheaply.

Would you give that idea a chance?