Pytorch Torch Nn Ref
PyTorch's `torch.nn` module is the core component for building and training neural networks, providing a rich set of classes and functions to define and manipulate neural network architectures.
Here are some key components and their functionalities within the `torch.nn` module:
**1. nn.Module Class:**
- The `nn.Module` class serves as the base class for all custom neural network models. Users typically derive their own model classes from this class and define the network layer structure along with the forward pass function.
**2. Predefined Layers (Modules):**
- Includes various types of layer components such as convolutional layers (`nn.Conv1d`, `nn.Conv2d`, `nn.Conv3d`), fully connected layers (`nn.Linear`), activation functions (`nn.ReLU`, `nn.Sigmoid`, `nn.Tanh`, etc.).
**3. Container Classes:**
- `nn.Sequential`: Allows combining multiple layers in sequence to form simple linear stacked networks.
- `nn.ModuleList` and `nn.ModuleDict`: Enable dynamic storage and access of sub-modules, supporting variable-length or named collections of modules.
**4. Loss Functions:**
- `torch.nn` contains a range of loss functions used to measure the discrepancy between model predictions and true labels, such as Mean Squared Error Loss (`nn.MSELoss`), Cross-Entropy Loss (`nn.CrossEntropyLoss`, etc.).
**5. Functional Interface:**
- `nn.functional` (often abbreviated as `F`) includes numerous functions that can be directly applied to tensors, implementing the same functionality as layer objects but without the capability to save or update parameters. For example, you can use `F.relu()` to perform ReLU operations or `F.conv2d()` for convolution operations.
**6. Initialization Methods:**
- `torch.nn.init` provides several common weight initialization strategies, such as Xavier initialization (`nn.init.xavier_uniform_()`) and Kaiming initialization (`nn.init.kaiming_uniform_()`), which are crucial for successful neural network training.
**7. Transformer Layers:**
- PyTorch offers complete Transformer architecture components, including `nn.Transformer`, `nn.TransformerEncoder`, `nn.TransformerDecoder`, and attention mechanisms like `nn.MultiheadAttention`.
**8. Normalization Layers:**
- Includes Batch Normalization (`BatchNorm`), Layer Normalization (`LayerNorm`), Group Normalization (`GroupNorm`), Instance Normalization (`InstanceNorm`), and RMSNorm, among others.
* * *
## PyTorch torch.nn Module Reference Manual
### **Neural Network Containers**
| **Class/Function** | **Description** |
| --- | --- |
| [`torch.nn.Module`](#) | Base class for all neural network modules. |
| [`torch.nn.Sequential(*args)`](#) | Combines multiple modules in sequence. |
| [`torch.nn.ModuleList(modules)`](#) | Stores sub-modules in a list. |
| `torch.nn.ModuleDict(modules)` | Stores sub-modules in a dictionary. |
| `torch.nn.ParameterList(parameters)` | Stores parameters in a list. |
| `torch.nn.ParameterDict(parameters)` | Stores parameters in a dictionary. |
| [`torch.nn.Parameter(data)`](#) | Creates a learnable parameter tensor. |
| `torch.nn.Buffer(data)` | Creates a persistent buffer (non-learnable parameter). |
| `torch.nn.Identity(*args, **kwargs)` | Identity transformation layer, outputs input unchanged. |
### **Global Hooks**
| **Function** | **Description** |
| --- | --- |
| `register_module_forward_pre_hook(hook)` | Registers a pre-forward hook. |
| `register_module_forward_hook(hook)` | Registers a forward hook. |
| `register_module_backward_hook(hook)` | Registers a backward hook. |
| `register_module_full_backward_pre_hook(hook)` | Registers a full backward pre-hook. |
| `register_module_full_backward_hook(hook)` | Registers a full backward hook. |
* * *
### **Linear Layers**
| **Class/Function** | **Description** |
| --- | --- |
| [`torch.nn.Linear(in_features, out_features, bias)`](#) | Fully connected layer (linear transformation). |
| [`torch.nn.Bilinear(in1_features, in2_features, out_features, bias)`](#) | Bilinear layer. |
| `torch.nn.LazyLinear(out_features, bias)` | Lazy-initialized linear layer, infers input dimensions automatically during the first forward pass. |
* * *
### **Convolutional Layers**
| **Class/Function** | **Description** |
| --- | --- |
| [`torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias, padding_mode)`](#) | 1D convolutional layer, commonly used for text and audio. |
| [`torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias, padding_mode)`](#) | 2D convolutional layer, commonly used for images. |
| [`torch.nn.Conv3d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias, padding_mode)`](#) | 3D convolutional layer, commonly used for video and volumetric data. |
| `torch.nn.ConvTranspose1d(in_channels, out_channels, kernel_size, stride, padding, output_padding, groups, bias, dilation, padding_mode)` | 1D transposed convolution (deconvolution), used for upsampling. |
| [`torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride, padding, output_padding, groups, bias, dilation, padding mode)`](#) | 2D transposed convolution (deconvolution), used for upsampling. |
| `torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size, stride, padding, output_padding, groups, bias, dilation, padding mode)` | 3D transposed convolution (deconvolution), used for upsampling. |
| `torch.nn.Unfold(kernel_size, dilation, padding, stride)` | Unfolds an input tensor into sliding window blocks. |
| `torch.nn.Fold(output_size, kernel_size, dilation, padding, stride)` | Reassembles unfolded blocks back into a tensor. |
* * *
### **Pooling Layers**
| **Class/Function** | **Description** |
| --- | --- |
| `torch.nn.MaxPool1d(kernel_size, stride, padding, dilation, return_indices)` | 1D max pooling layer. |
| [`torch.nn.MaxPool2d(kernel_size, stride, padding, dilation, return indices, ceil_mode)`](#) | 2D max pooling layer. |
| `torch.nn.MaxPool3d(kernel_size, stride, padding, dilation, return indices, ceil mode)` | 3D max pooling layer. |
| `torch.nn.MaxUnpool1d(kernel_size, stride, padding)` | 1D max unpooling layer. |
| `torch.nn.MaxUnpool2d(kernel_size, stride, padding)` | 2D max unpooling layer. |
| `torch.nn.MaxUnpool3d(kernel_size, stride, padding)` | 3D max unpooling layer. |
| `torch.nn.AvgPool1d(kernel_size, stride, padding)` | 1D average pooling layer. |
| [`torch.nn.AvgPool2d(kernel_size, stride, padding, ceil mode, count_include_pad)`](#) | 2D average pooling layer. |
| `torch.nn.AvgPool3d(kernel_size, stride, padding, ceil mode, count include pad)` | 3D average pooling layer. |
| `torch.nn.AdaptiveMaxPool1d(output_size, return indices)` | 1D adaptive max pooling, fixed output size. |
| [`torch.nn.AdaptiveMaxPool2d(output size, return indices)`](#) | 2D adaptive max pooling, fixed output size. |
| `torch.nn.AdaptiveMaxPool3d(output size, return indices)` | 3D adaptive max pooling, fixed output size. |
| `torch.nn.AdaptiveAvgPool1d(output size)` | 1D adaptive average pooling, fixed output size. |
| `torch.nn.AdaptiveAvgPool2d(output size)` | 2D adaptive average pooling, fixed output size. |
| `torch.nn.AdaptiveAvgPool3d(output size)` | 3D adaptive average pooling, fixed output size. |
| `torch.nn.LPPool1d(norm_type, kernel size, stride, padding)` | 1D Lp pooling layer. |
| `torch.nn.LPPool2d(norm type, kernel size, stride, padding)` | 2D Lp pooling layer. |
| `torch.nn.FractionalMaxPool2d(kernel size, output size, output ratio, return indices)` | 2D fractional max pooling, using random steps. |
| `torch.nn.FractionalMaxPool3d(kernel size, output size, output ratio, return indices)` | 3D fractional max pooling, using random steps. |
* * *
### **Padding Layers**
| **Class/Function** | **Description** |
| --- | --- |
| `torch.nn.ReflectionPad1d(padding)` | 1D reflection padding, replicates edge values along boundaries. |
| `torch.nn.ReflectionPad2d(padding)` | 2D reflection padding, replicates edge values along boundaries. |
| `torch.nn.ReflectionPad3d(padding)` | 3D reflection padding, replicates edge values along boundaries. |
| `torch.nn.ReplicationPad1d(padding)` | 1D replication padding, copies edge values along boundaries. |
| `torch.nn.ReplicationPad2d(padding)` | 2D replication padding, copies edge values along boundaries. |
| `torch.nn.ReplicationPad3d(padding)` | 3D replication padding, copies edge values along boundaries. |
| `torch.nn.ZeroPad1d(padding)` | 1D zero padding. |
| `torch.nn.ZeroPad2d(padding)` | 2D zero padding. |
| `torch.nn.ZeroPad3d(padding)` | 3D zero padding. |
| `torch.nn.ConstantPad1d(padding, value)` | 1D constant padding, fills with specified value. |
| `torch.nn.ConstantPad2d(padding, value)` | 2D constant padding, fills with specified value. |
| `torch.nn.ConstantPad3d(padding, value)` | 3D constant padding, fills with specified value. |
| `torch.nn.CircularPad1d(padding)` | 1D circular padding. |
| `torch.nn.CircularPad2d(padding)` | 2D circular padding. |
| `torch.nn.CircularPad3d(padding)` | 3D circular padding. |
* * *
### **Activation Functions (Nonlinear Activation - Weighted Sum Type)**
| **Class/Function** | **Description** |
| --- | --- |
| [`torch.nn.ReLU(inplace)`](#) | ReLU activation function, f(x) = max(0, x). |
| `torch.nn.ReLU6(inplace)` | ReLU6 activation function, f(x) = min(max(0, x), 6). |
| [`torch.nn.Sigmoid()`](#) | Sigmoid activation function, f(x) = 1 / (1 + exp(-x)). |
| [`torch.nn.Tanh()`](#) | Tanh activation function, f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)). |
| [`torch.nn.LeakyReLU(negative_slope, inplace)`](#) | LeakyReLU, allows small gradients for negative values. |
| `torch.nn.PReLU(num_parameters, init)` | Parameterized ReLU, with learnable negative slope parameter. |
| [`torch.nn.ELU(alpha, inplace)`](#) | Exponential Linear Unit, uses exponential function for negative values. |
| `torch.nn.CELU(alpha, inplace)` | Continuous Differentiable Exponential Linear Unit. |
| `torch.nn.SELU(inplace)` | Self-Normalizing Exponential Linear Unit. |
| [`torch.nn.GELU()`](#) | Gaussian Error Linear Unit.
YouTip