Pytorch Torch Nn Conv2D
[ PyTorch torch.nn Reference Manual](https://example.com/pytorch/pytorch-torch-nn-ref.html)
* * *
`torch.nn.Conv2d` is a module in PyTorch for two-dimensional convolution and serves as a core component of Convolutional Neural Networks (CNNs).
It extracts spatial features by applying learnable convolutional kernels to input tensors, widely used in image processing and computer vision tasks.
### Function Definition
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
**Parameter Description:**
* `in_channels` (int): Number of input channels. For example, RGB images have 3 channels.
* `out_channels` (int): Number of output channels, i.e., the number of convolutional kernels.
* `kernel_size` (int or tuple): Size of the convolutional kernel. Can be an integer (square) or a tuple (height x width).
* `stride` (int or tuple): Stride of the convolutional kernel. Default is 1.
* `padding` (int or tuple): Padding size applied to the input edges. Default is 0.
* `dilation` (int or tuple): Spacing between kernel elements. Default is 1 (standard convolution).
* `groups` (int): Number of groups for grouped convolution. Default is 1 (standard convolution).
* `bias` (bool): Whether to add a bias term. Default is `True`.
* `padding_mode` (str): Padding mode. Options are `'zeros'`, `'reflect'`, `'replicate'`, `'circular'`.
**Attributes:**
* `weight` (Tensor): Learnable weights with shape (out_channels, in_channels/groups, kernel_size, kernel_size).
* `bias` (Tensor): Learnable bias with shape (out_channels,).
* * *
## Usage Examples
### Example 1: Basic Usage
Create a simple 2D convolutional layer:
## Instance
import torch
import torch.nn as nn
# Create convolutional layer: input 3 channels, output 32 channels, kernel 3x3
conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
# Print shapes of weight and bias
print("Weight shape:", conv.weight.shape)# torch.Size([32, 3, 3, 3])
print("Bias shape:", conv.bias.shape)# torch.Size()
# Create input tensor: batch=1, channels=3, height=32, width=32
input_tensor = torch.randn(1,3,32,32)
# Forward pass
output = conv(input_tensor)
print("Input shape:", input_tensor.shape)# torch.Size([1, 3, 32, 32])
print("Output shape:", output.shape)# torch.Size([1, 32, 30, 30])
Output result:
Weight shape: torch.Size([32, 3, 3, 3]) Bias shape: torch.Size() Input shape: torch.Size([1, 3, 32, 32]) Output shape: torch.Size([1, 32, 30, 30])
By default, padding=0, so the output size decreases. To maintain the same size, add padding.
### Example 2: Using padding to Maintain Size
Add padding to keep input and output dimensions consistent:
## Instance
import torch
import torch.nn as nn
# Create convolutional layer with padding: padding=1 maintains size
conv_pad = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)
# Input
input_tensor = torch.randn(1,3,32,32)
# Forward pass
output = conv_pad(input_tensor)
print("Input shape:", input_tensor.shape)
print("Output shape:", output.shape)# Maintains 32x32
Output result:
Input shape: torch.Size([1, 3, 32, 32]) Output shape: torch.Size([1, 32, 32, 32])
### Example 3: Different stride and dilation
Adjusting stride and dilation can change output size and receptive field:
## Instance
import torch
import torch.nn as nn
# Strided convolution: stride=2 reduces size
conv_stride = nn.Conv2d(3,32, kernel_size=3, stride=2, padding=1)
input_tensor = torch.randn(1,3,32,32)
output_stride = conv_stride(input_tensor)
print("Stride=2 -> Output shape:", output_stride.shape)
# Dilated convolution: dilation=2 increases receptive field
conv_dilation = nn.Conv2d(3,32, kernel_size=3, dilation=2)
output_dilation = conv_dilation(input_tensor)
print("Dilation=2 -> Output shape:", output_dilation.shape)
Output result:
Stride=2 -> Output shape: torch.Size([1, 32, 16, 16]) Dilation=2 -> Output shape: torch.Size([1, 32, 28, 28])
### Example 4: Grouped Convolution
The groups parameter enables grouped convolution, commonly used in lightweight networks:
## Instance
import torch
import torch.nn as nn
# Grouped convolution: groups=2 splits input into 2 groups
conv_group = nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, groups=2)
# Input with 4 channels
input_tensor = torch.randn(1,4,16,16)
# Forward pass
output = conv_group(input_tensor)
print("Input shape:", input_tensor.shape)
print("Output shape:", output.shape)
print("Weight shape:", conv_group.weight.shape)# Weight shape differs after grouping
Output result:
Input shape: torch.Size([1, 4, 16, 16]) Output shape: torch.Size([1, 8, 14, 14]) Weight shape: torch.Size([8, 2, 3, 3])
### Example 5: Using in a Neural Network
Build a simple handwritten digit recognition network:
## Instance
import torch
import torch.nn as nn
class SimpleCNN(nn.Module):
def __init__ (self, num_classes=10):
super(SimpleCNN,self). __init__ ()
# First convolution block
self.conv1= nn.Conv2d(1,32, kernel_size=3, padding=1)
self.bn1= nn.BatchNorm2d(32)
self.relu1= nn.ReLU()
# Second convolution block
self.conv2= nn.Conv2d(32,64, kernel_size=3, padding=1)
self.bn2= nn.BatchNorm2d(64)
self.relu2= nn.ReLU()
# Pooling layer
self.pool= nn.MaxPool2d(2,2)
# Fully connected layer
self.fc= nn.Linear(64 * 7 * 7, num_classes)
def forward(self, x):
# First convolution block
x =self.conv1(x)
x =self.bn1(x)
x =self.relu1(x)
x =self.pool(x)
# Second convolution block
x =self.conv2(x)
x =self.bn2(x)
x =self.relu2(x)
x =self.pool(x)
# Flatten and classify
x = x.view(x.size(0), -1)
x =self.fc(x)
return x
# Create model
model = SimpleCNN(num_classes=10)
# Test input: batch=4, grayscale image 28x28
input_image = torch.randn(4,1,28,28)
output = model(input_image)
print("Input shape:", input_image.shape)
print("Output shape:", output.shape)# torch.Size([4, 10])
* * *
## Output Size Calculation
Formula for calculating output size of a convolutional layer:
H_out = floor((H_in + 2 * padding - dilation * (kernel_size - 1) - 1) / stride) + 1 W_out = floor((W_in + 2 * padding - dilation * (kernel_size - 1) - 1) / stride) + 1
* * *
## Common Questions
### Q1: How to choose kernel size?
Common kernel sizes:
* `1x1`: Used to change channel count and add non-linearity
* `3x3`: Most common, balances parameters and receptive field
* `5x5`, `7x7`: Larger receptive field but more parameters
### Q2: How to choose padding and stride?
* Use `padding = (kernel_size - 1) / 2` to maintain feature map size
* Use `stride > 1` for downsampling
* * *
## Application Scenarios
`nn.Conv2d` is one of the most important layers in computer vision, with main applications including:
* **Image Classification**: Extract image features, e.g., VGG, ResNet
* **Object Detection**: YOLO, Faster R-CNN, etc.
* **Semantic Segmentation**: U-Net, FCN, etc.
* **Style Transfer**: Generate artistic images
> Tip: In modern CNNs, 3x3 convolutions are most commonly used, as they cover sufficient spatial information while keeping parameter count low.
* * *
[ PyTorch torch.nn Reference Manual](https://example.com/pytorch/pytorch-torch-nn-ref.html)
YouTip