Pytorch Torch Nn Maxpool2D

```html PyTorch torch.nn.MaxPool2d Function | Online Tutorial

Image 1: PyTorch torch.nn Reference Manual PyTorch torch.nn Reference Manual

torch.nn.MaxPool2d is a module in PyTorch used for 2D max pooling.

The pooling layer can reduce the spatial dimensions of feature maps, reducing computational load while providing a certain degree of translation invariance.

Function Definition

torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

Parameter Description:

kernel_size (int or tuple): The size of the pooling window.
stride (int or tuple): The step size of the pooling window movement. Defaults to kernel_size.
padding (int or tuple): The amount of padding added to the input edges. Defaults to 0.
dilation (int or tuple): The spacing between elements in the window. Defaults to 1.
return_indices (bool): Whether to return the indices of the maximum values. Used for MaxUnpool2d. Defaults to False.
ceil_mode (bool): Whether to use ceil instead of floor when calculating output dimensions. Defaults to False.

Usage Examples

Example 1: Basic Usage

Create a max pooling layer:

import torch
import torch.nn as nn

# Create max pooling layer: 2x2 window, stride 2
max_pool = nn.MaxPool2d(kernel_size=2, stride=2)

# Create input tensor
input_tensor = torch.randn(1,1,4,4)
print("Input:\n", input_tensor.squeeze().tolist())

# Forward pass
output = max_pool(input_tensor)
print("\nOutput:\n", output.squeeze().tolist())

print("Input shape:", input_tensor.shape)
print("Output shape:", output.shape)

The output result is:

Input:[[-0.4128, 0.2341, -0.9876, 0.4567], [ 0.1234, 0.8765, -0.2345, 0.6789], [-0.5678, 0.3456, 0.7890, -0.1234], [ 0.9012, -0.4567, 0.2345, 0.5678]]
Output:[[0.8765, 0.6789], [0.9012, 0.7890]]
Input shape: torch.Size([1, 1, 4, 4])
Output shape: torch.Size([1, 1, 2, 2])

You can see that the maximum value in each 2x2 window is retained.

Example 2: Different kernel_size and stride

Adjust pooling parameters:

import torch
import torch.nn as nn

# 3x3 pooling, stride 1 (no overlap)
pool3x3 = nn.MaxPool2d(kernel_size=3, stride=1)

# Non-square pooling
pool_rect = nn.MaxPool2d(kernel_size=(2,3), stride=(2,3))

input_tensor = torch.randn(1,1,6,9)
print("Input shape:", input_tensor.shape)
print("3x3 pooling output:", pool3x3(input_tensor).shape)
print("Rectangular pooling output:", pool_rect(input_tensor).shape)

Example 3: Using padding

Edge padding can preserve edge information to some extent:

import torch
import torch.nn as nn

# Pooling with padding
pool_padding = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)

input_tensor = torch.randn(1,1,4,4)
output = pool_padding(input_tensor)

print("Input shape:", input_tensor.shape)
print("Output shape (with padding):", output.shape)

Example 4: Returning indices

Using return_indices allows you to restore positions in the decoder:

import torch
import torch.nn as nn

# Create pooling layer that returns indices
pool_indices = nn.MaxPool2d(kernel_size=2, stride=2, return_indices=True)

input_tensor = torch.randn(1,1,4,4)
output, indices = pool_indices(input_tensor)

print("Output shape:", output.shape)
print("Indices shape:", indices.shape)
print("Indices values:", indices.squeeze().tolist())

Example 5: Using in CNN

A typical position for a pooling layer in a CNN structure:

import torch
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        # Convolutional layers
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.relu = nn.ReLU()
        # Pooling layer: reduces size by half after each pass
        self.pool = nn.MaxPool2d(2, 2)

    def forward(self, x):
        x = self.relu(self.conv1(x))  # 32x32
        x = self.pool(x)              # 16x16
        x = self.relu(self.conv2(x))  # 16x16
        x = self.pool(x)              # 8x8
        return x

model = SimpleCNN()
input_image = torch.randn(1, 3, 32, 32)
output = model(input_image)
print("Input shape:", input_image.shape)
print("Output shape:", output.shape)

Max Pooling vs Average Pooling

Type	Formula	Features	Applicable Scenarios
`MaxPool2d`	max(Region)	Retains significant features, more robust to noise	Image classification, object detection (commonly used)
`AvgPool2d`	mean(Region)	Smooths features, retains background information	Global average pooling, feature extraction

Common Questions

Q1: Can the pooling layer be removed?

Modern networks like ResNet and DenseNet tend to use smaller stride convolutions instead of pooling, but pooling is still commonly used for rapid downsampling.

Q2: What is the relationship between stride and kernel_size?

When stride = kernel_size, the pooling windows do not overlap; when stride < kernel_size, the pooling windows have overlap.

Usage Scenarios

nn.MaxPool2d is primarily used in the following scenarios:

Image classification networks: Gradually reduce resolution to extract high-level features
Object detection: Retain the positions of significant features
Reducing computational load: Reduce the size of feature maps
Increasing receptive field: Allow subsequent layers to see a larger range of features

Image 2: PyTorch torch.nn Reference Manual PyTorch torch.nn Reference Manual

```

YouTip