Pytorch Torch Nn Sigmoid
## PyTorch torch.nn.Sigmoid Function
`torch.nn.Sigmoid` is a built-in module in PyTorch that implements the Sigmoid activation function.
It maps any real-valued input into a range between **0 and 1**. Because of this property, it is widely used for binary classification tasks (to output probabilities) and as a gating mechanism in recurrent neural networks (like LSTMs and GRUs) to control information flow.
---
### Mathematical Definition
The Sigmoid function is mathematically defined as:
$$\text{Sigmoid}(x) = \sigma(x) = \frac{1}{1 + e^{-x}}$$
* **Input Range:** $(-\infty, +\infty)$
* **Output Range:** $(0, 1)$
---
### Syntax and Usage
In PyTorch, you can instantiate the Sigmoid function as a neural network layer:
```python
torch.nn.Sigmoid()
```
* **Input Shape:** $(*,)$ where $*$ means any number of dimensions.
* **Output Shape:** $(*,)$, matching the exact shape of the input.
---
## Code Examples
### Example 1: Basic Usage
This example demonstrates how to apply the `nn.Sigmoid` module to a 1D tensor.
```python
import torch
import torch.nn as nn
# Instantiate the Sigmoid module
sigmoid = nn.Sigmoid()
# Create a sample tensor
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
output = sigmoid(x)
print("Input: ", x.tolist())
print("Output:", output.tolist())
```
**Output:**
```text
Input: [-2.0, -1.0, 0.0, 1.0, 2.0]
Output: [0.11920292228460312, 0.2689414322376251, 0.5, 0.7310585975646973, 0.8807970881462097]
```
---
### Example 2: Binary Classification Output
In binary classification, the raw output of a linear layer (called **logits**) is passed through a Sigmoid function to obtain a probability score. If the probability is greater than 0.5, the model predicts class `1` (or `True`); otherwise, it predicts class `0` (or `False`).
```python
import torch
import torch.nn as nn
# Define a simple model with a Linear layer
model = nn.Linear(10, 1)
sigmoid = nn.Sigmoid()
# Generate random input data (batch size of 4, 10 features)
inputs = torch.randn(4, 10)
# Forward pass
logits = model(inputs)
probabilities = sigmoid(logits)
# Convert probabilities to binary predictions (threshold = 0.5)
predictions = probabilities > 0.5
print("Logits: ", logits.squeeze().tolist())
print("Probabilities:", probabilities.squeeze().tolist())
print("Predictions: ", predictions.squeeze().tolist())
```
---
### Example 3: Functional Alternatives
PyTorch provides functional alternatives if you do not want to instantiate `nn.Sigmoid` as an object. You can use `torch.sigmoid()` or `torch.nn.functional.sigmoid()`.
```python
import torch
import torch.nn.functional as F
x = torch.randn(4, 10)
# Method 1: Tensor method / torch namespace
output1 = torch.sigmoid(x)
# Method 2: Functional API
output2 = F.sigmoid(x)
# Verify both methods yield identical results
print("Output Shape: ", output1.shape)
print("Are both outputs equal?:", torch.allclose(output1, output2))
```
---
## Common Use Cases
1. **Binary Classification:** It maps the final layer's output to a single probability value representing the likelihood of the positive class.
2. **Gating Mechanisms:** Used in architectures like LSTMs, GRUs, and Highway Networks to decide how much information should pass through (0 means "let nothing pass", 1 means "let everything pass").
3. **Probability Normalization:** Any scenario where outputs must be strictly bounded between 0 and 1.
---
## Important Considerations & Best Practices
* **Vanishing Gradient Problem:** For very high or very low input values, the gradient of the Sigmoid function becomes extremely close to 0. In deep neural networks, this can cause gradients to vanish during backpropagation, preventing the model from learning. For hidden layers, activation functions like **ReLU** (`torch.nn.ReLU`) or its variants are generally preferred.
* **Numerical Stability:** When training a binary classifier, it is highly recommended to use `torch.nn.BCEWithLogitsLoss` instead of combining `nn.Sigmoid` with `nn.BCELoss`. `BCEWithLogitsLoss` groups the Sigmoid and the Binary Cross Entropy loss into a single class, utilizing the log-sum-exp trick for superior numerical stability.
YouTip