Pytorch Torch Nn Silu

## PyTorch torch.nn.SiLU `torch.nn.SiLU` is the Sigmoid Linear Unit activation function in PyTorch, also commonly referred to as **Swish**. It features a self-gating property, is smoother than the standard ReLU activation function, and has been shown to outperform ReLU on many deep learning tasks, particularly in state-of-the-art computer vision architectures. --- ### Mathematical Definition The SiLU function is mathematically defined as: $$\text{SiLU}(x) = x \cdot \sigma(x) = \frac{x}{1 + e^{-x}}$$ Where $\sigma(x)$ is the standard sigmoid function. --- ### Syntax ```python torch.nn.SiLU(inplace=False) ``` #### Parameters: * **`inplace`** *(bool, optional)*: If set to `True`, it will perform the operation in-place, modifying the input tensor directly to save memory. Default is `False`. --- ## Code Examples ### Example 1: Basic Usage This example demonstrates how to initialize the `nn.SiLU` module and apply it to a 1D tensor. ```python import torch import torch.nn as nn # Initialize the SiLU activation function silu = nn.SiLU() # Create a sample tensor x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0]) output = silu(x) print("Input: ", x.tolist()) print("Output:", output.tolist()) ``` **Output:** ```text Input: [-2.0, -1.0, 0.0, 1.0, 2.0] Output: [-0.23840582370758057, -0.2689414322376251, 0.0, 0.7310585975646973, 1.7615940570831299] ``` --- ### Example 2: Comparing SiLU vs. ReLU The following example compares the outputs of `SiLU` and `ReLU` side-by-side across a range of values. Notice how `SiLU` allows a small, smooth negative gradient for negative inputs, unlike `ReLU` which truncates them completely to zero. ```python import torch import torch.nn as nn import numpy as np # Generate 7 evenly spaced points between -3 and 3 x = np.linspace(-3, 3, 7) x_tensor = torch.tensor(x, dtype=torch.float32) print(f"{'x':>5} | {'SiLU':>9} | {'ReLU':>9}") print("-" * 32) for xi in x_tensor: silu_val = nn.SiLU()(xi.unsqueeze(0)).item() relu_val = nn.ReLU()(xi.unsqueeze(0)).item() print(f"{xi.item():5.1f} | {silu_val:9.4f} | {relu_val:9.4f}") ``` **Output:** ```text x | SiLU | ReLU -------------------------------- -3.0 | -0.1423 | 0.0000 -2.0 | -0.2384 | 0.0000 -1.0 | -0.2689 | 0.0000 0.0 | 0.0000 | 0.0000 1.0 | 0.7311 | 1.0000 2.0 | 1.7616 | 2.0000 3.0 | 2.8577 | 3.0000 ``` --- ### Example 3: Using SiLU in a Convolutional Neural Network SiLU is widely used in modern convolutional networks. Below is an example of integrating `nn.SiLU` into a sequential model block (similar to those found in MobileNetV3 or EfficientNet). ```python import torch import torch.nn as nn # A feature extraction block using SiLU model = nn.Sequential( nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1), nn.BatchNorm2d(32), nn.SiLU(), nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.SiLU() ) # Dummy input representing (batch_size, channels, height, width) x = torch.randn(1, 3, 224, 224) output = model(x) print("Input shape: ", x.shape) print("Output shape:", output.shape) ``` **Output:** ```text Input shape: torch.Size([1, 3, 224, 224]) Output shape: torch.Size([1, 64, 112, 112]) ``` --- ## Common Use Cases & Considerations ### Key Applications * **MobileNetV3 & EfficientNet**: SiLU (Swish) is the default activation function in these highly optimized architectures, contributing significantly to their superior accuracy-to-parameter ratio. * **Deep Residual Networks**: Used in modern variants of ResNet and YOLO (e.g., YOLOv5, YOLOv8) to improve gradient flow during backpropagation. * **Smooth Gating**: Ideal for tasks where a smooth, non-monotonic activation function is preferred over the sharp threshold of ReLU. ### Considerations > ⚠️ **Computational Cost**: SiLU requires computing an exponential function ($e^{-x}$) for the sigmoid term. This makes it computationally more expensive than ReLU, which only requires a simple maximum comparison (`max(0, x)`). However, on modern GPUs, this overhead is usually negligible compared to the performance gains.

YouTip

Pytorch Torch Nn Silu

📂 Categories