Pytorch Torch Nn Silu
## PyTorch torch.nn.SiLU
`torch.nn.SiLU` is the Sigmoid Linear Unit activation function in PyTorch, also commonly referred to as **Swish**.
It features a self-gating property, is smoother than the standard ReLU activation function, and has been shown to outperform ReLU on many deep learning tasks, particularly in state-of-the-art computer vision architectures.
---
### Mathematical Definition
The SiLU function is mathematically defined as:
$$\text{SiLU}(x) = x \cdot \sigma(x) = \frac{x}{1 + e^{-x}}$$
Where $\sigma(x)$ is the standard sigmoid function.
---
### Syntax
```python
torch.nn.SiLU(inplace=False)
```
#### Parameters:
* **`inplace`** *(bool, optional)*: If set to `True`, it will perform the operation in-place, modifying the input tensor directly to save memory. Default is `False`.
---
## Code Examples
### Example 1: Basic Usage
This example demonstrates how to initialize the `nn.SiLU` module and apply it to a 1D tensor.
```python
import torch
import torch.nn as nn
# Initialize the SiLU activation function
silu = nn.SiLU()
# Create a sample tensor
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
output = silu(x)
print("Input: ", x.tolist())
print("Output:", output.tolist())
```
**Output:**
```text
Input: [-2.0, -1.0, 0.0, 1.0, 2.0]
Output: [-0.23840582370758057, -0.2689414322376251, 0.0, 0.7310585975646973, 1.7615940570831299]
```
---
### Example 2: Comparing SiLU vs. ReLU
The following example compares the outputs of `SiLU` and `ReLU` side-by-side across a range of values. Notice how `SiLU` allows a small, smooth negative gradient for negative inputs, unlike `ReLU` which truncates them completely to zero.
```python
import torch
import torch.nn as nn
import numpy as np
# Generate 7 evenly spaced points between -3 and 3
x = np.linspace(-3, 3, 7)
x_tensor = torch.tensor(x, dtype=torch.float32)
print(f"{'x':>5} | {'SiLU':>9} | {'ReLU':>9}")
print("-" * 32)
for xi in x_tensor:
silu_val = nn.SiLU()(xi.unsqueeze(0)).item()
relu_val = nn.ReLU()(xi.unsqueeze(0)).item()
print(f"{xi.item():5.1f} | {silu_val:9.4f} | {relu_val:9.4f}")
```
**Output:**
```text
x | SiLU | ReLU
--------------------------------
-3.0 | -0.1423 | 0.0000
-2.0 | -0.2384 | 0.0000
-1.0 | -0.2689 | 0.0000
0.0 | 0.0000 | 0.0000
1.0 | 0.7311 | 1.0000
2.0 | 1.7616 | 2.0000
3.0 | 2.8577 | 3.0000
```
---
### Example 3: Using SiLU in a Convolutional Neural Network
SiLU is widely used in modern convolutional networks. Below is an example of integrating `nn.SiLU` into a sequential model block (similar to those found in MobileNetV3 or EfficientNet).
```python
import torch
import torch.nn as nn
# A feature extraction block using SiLU
model = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(32),
nn.SiLU(),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64),
nn.SiLU()
)
# Dummy input representing (batch_size, channels, height, width)
x = torch.randn(1, 3, 224, 224)
output = model(x)
print("Input shape: ", x.shape)
print("Output shape:", output.shape)
```
**Output:**
```text
Input shape: torch.Size([1, 3, 224, 224])
Output shape: torch.Size([1, 64, 112, 112])
```
---
## Common Use Cases & Considerations
### Key Applications
* **MobileNetV3 & EfficientNet**: SiLU (Swish) is the default activation function in these highly optimized architectures, contributing significantly to their superior accuracy-to-parameter ratio.
* **Deep Residual Networks**: Used in modern variants of ResNet and YOLO (e.g., YOLOv5, YOLOv8) to improve gradient flow during backpropagation.
* **Smooth Gating**: Ideal for tasks where a smooth, non-monotonic activation function is preferred over the sharp threshold of ReLU.
### Considerations
> β οΈ **Computational Cost**: SiLU requires computing an exponential function ($e^{-x}$) for the sigmoid term. This makes it computationally more expensive than ReLU, which only requires a simple maximum comparison (`max(0, x)`). However, on modern GPUs, this overhead is usually negligible compared to the performance gains.
YouTip