PyTorch torch.nn Reference Manual
torch.nn.MaxPool2d is a module in PyTorch used for 2D max pooling.
The pooling layer can reduce the spatial dimensions of feature maps, reducing computational load while providing a certain degree of translation invariance.
Function Definition
torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
Parameter Description:
- kernel_size (int or tuple): The size of the pooling window.
- stride (int or tuple): The step size of the pooling window movement. Defaults to
kernel_size. - padding (int or tuple): The amount of padding added to the input edges. Defaults to 0.
- dilation (int or tuple): The spacing between elements in the window. Defaults to 1.
- return_indices (bool): Whether to return the indices of the maximum values. Used for MaxUnpool2d. Defaults to False.
- ceil_mode (bool): Whether to use ceil instead of floor when calculating output dimensions. Defaults to False.
Usage Examples
Example 1: Basic Usage
Create a max pooling layer:
import torch
import torch.nn as nn
# Create max pooling layer: 2x2 window, stride 2
max_pool = nn.MaxPool2d(kernel_size=2, stride=2)
# Create input tensor
input_tensor = torch.randn(1,1,4,4)
print("Input:\n", input_tensor.squeeze().tolist())
# Forward pass
output = max_pool(input_tensor)
print("\nOutput:\n", output.squeeze().tolist())
print("Input shape:", input_tensor.shape)
print("Output shape:", output.shape)
The output result is:
Input:[[-0.4128, 0.2341, -0.9876, 0.4567], [ 0.1234, 0.8765, -0.2345, 0.6789], [-0.5678, 0.3456, 0.7890, -0.1234], [ 0.9012, -0.4567, 0.2345, 0.5678]]
Output:[[0.8765, 0.6789], [0.9012, 0.7890]]
Input shape: torch.Size([1, 1, 4, 4])
Output shape: torch.Size([1, 1, 2, 2])
You can see that the maximum value in each 2x2 window is retained.
Example 2: Different kernel_size and stride
Adjust pooling parameters:
import torch
import torch.nn as nn
# 3x3 pooling, stride 1 (no overlap)
pool3x3 = nn.MaxPool2d(kernel_size=3, stride=1)
# Non-square pooling
pool_rect = nn.MaxPool2d(kernel_size=(2,3), stride=(2,3))
input_tensor = torch.randn(1,1,6,9)
print("Input shape:", input_tensor.shape)
print("3x3 pooling output:", pool3x3(input_tensor).shape)
print("Rectangular pooling output:", pool_rect(input_tensor).shape)
Example 3: Using padding
Edge padding can preserve edge information to some extent:
import torch
import torch.nn as nn
# Pooling with padding
pool_padding = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
input_tensor = torch.randn(1,1,4,4)
output = pool_padding(input_tensor)
print("Input shape:", input_tensor.shape)
print("Output shape (with padding):", output.shape)
Example 4: Returning indices
Using return_indices allows you to restore positions in the decoder:
import torch
import torch.nn as nn
# Create pooling layer that returns indices
pool_indices = nn.MaxPool2d(kernel_size=2, stride=2, return_indices=True)
input_tensor = torch.randn(1,1,4,4)
output, indices = pool_indices(input_tensor)
print("Output shape:", output.shape)
print("Indices shape:", indices.shape)
print("Indices values:", indices.squeeze().tolist())
Example 5: Using in CNN
A typical position for a pooling layer in a CNN structure:
import torch
import torch.nn as nn
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
# Convolutional layers
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.relu = nn.ReLU()
# Pooling layer: reduces size by half after each pass
self.pool = nn.MaxPool2d(2, 2)
def forward(self, x):
x = self.relu(self.conv1(x)) # 32x32
x = self.pool(x) # 16x16
x = self.relu(self.conv2(x)) # 16x16
x = self.pool(x) # 8x8
return x
model = SimpleCNN()
input_image = torch.randn(1, 3, 32, 32)
output = model(input_image)
print("Input shape:", input_image.shape)
print("Output shape:", output.shape)
Max Pooling vs Average Pooling
| Type | Formula | Features | Applicable Scenarios |
|---|---|---|---|
MaxPool2d |
max(Region) | Retains significant features, more robust to noise | Image classification, object detection (commonly used) |
AvgPool2d |
mean(Region) | Smooths features, retains background information | Global average pooling, feature extraction |
Common Questions
Q1: Can the pooling layer be removed?
Modern networks like ResNet and DenseNet tend to use smaller stride convolutions instead of pooling, but pooling is still commonly used for rapid downsampling.
Q2: What is the relationship between stride and kernel_size?
When stride = kernel_size, the pooling windows do not overlap; when stride < kernel_size, the pooling windows have overlap.
Usage Scenarios
nn.MaxPool2d is primarily used in the following scenarios:
- Image classification networks: Gradually reduce resolution to extract high-level features
- Object detection: Retain the positions of significant features
- Reducing computational load: Reduce the size of feature maps
- Increasing receptive field: Allow subsequent layers to see a larger range of features
YouTip