Pytorch Torch Nn Avgpool2D
## PyTorch torch.nn.AvgPool2d Module
`torch.nn.AvgPool2d` is a built-in module in PyTorch that performs two-dimensional average pooling over an input signal composed of several input planes.
It applies a 2D average pooling over an input signal composed of several input planes by computing the average value of each window. It is widely used for downsampling feature maps and aggregating spatial features in convolutional neural networks (CNNs).
---
### Class Definition
```python
class torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)
```
### Parameter Descriptions
* **`kernel_size`** *(int or tuple)*: The size of the pooling window. Can be a single number (for a square window of `kernel_size` $\times$ `kernel_size`) or a tuple `(kH, kW)`.
* **`stride`** *(int or tuple, optional)*: The stride of the pooling window. Can be a single number or a tuple `(sH, sW)`. Default value is `kernel_size`.
* **`padding`** *(int or tuple, optional)*: Implicit zero-padding to be added on both sides. Default is `0`.
* **`ceil_mode`** *(bool, optional)*: When set to `True`, it will use *ceil* instead of *floor* to compute the output shape. This prevents losing boundary pixels. Default is `False`.
* **`count_include_pad`** *(bool, optional)*: When set to `True`, zero-padding will be included in the averaging calculation. Default is `True`.
* **`divisor_override`** *(int, optional)*: If specified, it will be used as the divisor instead of the total number of pooling elements in the window. Default is `None`.
---
### Input and Output Shapes
* **Input**: $(N, C, H_{in}, W_{in})$ or $(C, H_{in}, W_{in})$
* **Output**: $(N, C, H_{out}, W_{out})$ or $(C, H_{out}, W_{out})$
The output dimensions are calculated as follows:
$$H_{out} = \left\lfloor \frac{H_{in} + 2 \times \text{padding} - \text{kernel\_size}}{\text{stride}} \right\rfloor + 1$$
$$W_{out} = \left\lfloor \frac{W_{in} + 2 \times \text{padding} - \text{kernel\_size}}{\text{stride}} \right\rfloor + 1$$
*(If `ceil_mode` is `True`, the floor function $\lfloor \dots \rfloor$ is replaced by the ceiling function $\lceil \dots \rceil$.)*
---
## Code Examples
### Example 1: Basic Usage
This example demonstrates how to apply a basic $2\times2$ average pooling with a stride of $2$ on a randomly generated tensor.
```python
import torch
import torch.nn as nn
# Define a 2D average pooling layer with a 2x2 window and stride of 2
pool = nn.AvgPool2d(kernel_size=2, stride=2)
# Create a random input tensor of shape (Batch, Channels, Height, Width)
x = torch.randn(1, 1, 4, 4)
output = pool(x)
print("Input:\n", x.squeeze().tolist())
print("\nOutput:\n", output.squeeze().tolist())
print("\nShape transformation:", x.shape, "->", output.shape)
```
---
### Example 2: Global Average Pooling (GAP)
Global Average Pooling is a common technique used before the final classification layer in modern CNN architectures (like ResNet) to reduce the spatial dimensions to $1\times1$. While `AvgPool2d` can do this if you match the kernel size to the input size, `AdaptiveAvgPool2d` is typically preferred for this specific task because you only need to specify the target output size.
```python
import torch
import torch.nn as nn
# Use AdaptiveAvgPool2d to dynamically pool any input size down to 1x1
gap = nn.AdaptiveAvgPool2d(1)
# Input shape: (Batch=4, Channels=64, Height=16, Width=16)
x = torch.randn(4, 64, 16, 16)
out = gap(x)
print("Input shape:", x.shape)
print("Output shape:", out.shape)
print("Number of average values per channel:", out.numel() // 64)
```
---
### Example 3: Comparing MaxPool2d vs. AvgPool2d
This example highlights the difference between Max Pooling (which extracts the most prominent feature/activation) and Average Pooling (which smooths features by calculating the mean).
```python
import torch
import torch.nn as nn
# Create a 4x4 matrix
x = torch.tensor([[[
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]
]]], dtype=torch.float32)
maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
avgpool = nn.AvgPool2d(kernel_size=2, stride=2)
print("Input Matrix:\n", x[0, 0])
print("\nMaxPool2d Output (Extracts maximums):\n", maxpool(x)[0, 0])
print("\nAvgPool2d Output (Computes averages):\n", avgpool(x)[0, 0])
```
#### Output Explanation:
* **MaxPool2d** selects the maximum value in each $2\times2$ quadrant:
* Top-Left `[1, 2, 5, 6]` $\rightarrow$ **`6`**
* Top-Right `[3, 4, 7, 8]` $\rightarrow$ **`8`**
* **AvgPool2d** calculates the average value in each $2\times2$ quadrant:
* Top-Left `(1 + 2 + 5 + 6) / 4` $\rightarrow$ **`3.5`**
* Top-Right `(3 + 4 + 7 + 8) / 4` $\rightarrow$ **`5.5`**
---
## Common Use Cases
* **Feature Aggregation & Downsampling**: Reduces the spatial dimensions (width and height) of feature maps, which decreases computational complexity and memory footprint in deeper layers.
* **Global Average Pooling (GAP)**: Replaces traditional fully connected layers at the end of a network. This significantly reduces the number of model parameters and helps prevent overfitting.
* **Feature Smoothing**: Average pooling acts as a low-pass filter that smooths out localized noise and retains background information rather than just the sharpest activations.
YouTip