Pytorch Torch Norm
# Understanding PyTorch's Vector and Matrix Norms: `torch.norm` and `torch.linalg.norm`
In deep learning and scientific computing, calculating the norm of a vector or matrix is a fundamental operation. Norms are widely used in regularization (like L1/L2 regularization), gradient clipping, calculating loss functions (such as Mean Absolute Error or Mean Squared Error), and evaluating model similarity.
In PyTorch, calculating norms has evolved. While the legacy `torch.norm` function is still widely seen in older codebases, PyTorch has transitioned to the more robust and standard-compliant `torch.linalg.norm` module.
This comprehensive guide covers the syntax, usage, and practical examples of calculating norms in PyTorch, bridging the gap between legacy implementations and modern best practices.
---
## 1. The Evolution of Norms in PyTorch
Before diving into the syntax, it is important to understand the current state of PyTorch's API:
* **`torch.norm` (Legacy):** This function is deprecated in newer versions of PyTorch. While it still works for backward compatibility, it can lead to ambiguous behavior when distinguishing between vector and matrix norms.
* **`torch.linalg.norm` (Recommended):** Introduced in PyTorch 1.7.0, this function is consistent with NumPyβs `np.linalg.norm` and clearly separates vector and matrix operations.
*Note: This tutorial focuses on the modern `torch.linalg.norm` while providing context for `torch.norm` where necessary.*
---
## 2. Syntax and Parameters
### `torch.linalg.norm`
```python
torch.linalg.norm(A, ord=None, dim=None, keepdim=False, *, out=None)
```
#### Parameter Breakdown:
| Parameter | Type | Description |
| :--- | :--- | :--- |
| `A` | `Tensor` | The input tensor. |
| `ord` | `int`, `float`, `inf`, `-inf`, `'fro'`, `'nuc'` | The order of the norm. Controls the type of norm calculated (see table below). Default is `None`. |
| `dim` | `int`, `Tuple` | The dimension(s) along which to calculate the norm. If `None`, `A` is flattened to a 1D vector (unless calculating matrix norms). |
| `keepdim` | `bool` | If `True`, the reduced dimensions are retained with length 1. Default is `False`. |
| `out` | `Tensor` | Optional output tensor to write the result into. |
---
### Supported Norm Types (`ord` values)
The behavior of `ord` depends on whether you are calculating a **Vector Norm** (when `dim` is an integer or `None` on a 1D tensor) or a **Matrix Norm** (when `dim` is a 2-tuple of integers).
#### For Vectors (1D Tensors or flattened tensors):
| `ord` | Norm Type | Mathematical Formula | Description |
| :--- | :--- | :--- | :--- |
| `None` (Default) | L2 Norm | $\sqrt{\sum |x_i|^2}$ | Euclidean distance / standard L2 norm. |
| `2` | L2 Norm | $\sqrt{\sum |x_i|^2}$ | Same as `None`. |
| `1` | L1 Norm | $\sum |x_i|$ | Manhattan distance / sum of absolute values. |
| `float('inf')` | Infinity Norm | $\max(|x_i|)$ | Maximum absolute value. |
| `float('-inf')` | -Infinity Norm | $\min(|x_i|)$ | Minimum absolute value. |
| `0` | L0 "Norm" | $\sum (x_i \neq 0)$ | Number of non-zero elements. |
| Any `int` or `float` | $p$-Norm | $(\sum |x_i|^p)^{1/p}$ | General Minkowski distance. |
#### For Matrices (2D Tensors or 2D slices of multi-dimensional tensors):
| `ord` | Norm Type | Description |
| :--- | :--- | :--- |
| `None` (Default) | Frobenius Norm | Equivalent to flattening the matrix and taking the L2 norm. |
| `'fro'` | Frobenius Norm | Same as `None`. |
| `'nuc'` | Nuclear Norm | Sum of the singular values (used in low-rank matrix approximation). |
| `1` | Max Column Sum | $\max_j \sum_i |A_{ij}|$ |
| `2` | Spectral Norm | Largest singular value. |
| `float('inf')` | Max Row Sum | $\max_i \sum_j |A_{ij}|$ |
---
## 3. Code Examples
Let's look at practical implementations of these norms in PyTorch.
### Setup
```python
import torch
# Create a sample 1D vector and a 2D matrix
vector = torch.tensor([3.0, -4.0, 12.0])
matrix = torch.tensor([[1.0, 2.0, 3.0],
[-4.0, 5.0, -6.0]])
print("Vector:", vector)
print("Matrix:\n", matrix)
```
---
### Example 1: Vector Norms (L1, L2, and Infinity Norms)
```python
# 1. L2 Norm (Default)
l2_norm = torch.linalg.norm(vector)
print("L2 Norm (Default):", l2_norm.item()) # Expected: sqrt(3^2 + (-4)^2 + 12^2) = 13.0
# 2. L1 Norm (Manhattan Norm)
l1_norm = torch.linalg.norm(vector, ord=1)
print("L1 Norm:", l1_norm.item()) # Expected: 3 + 4 + 12 = 19.0
# 3. Infinity Norm (Max Absolute Value)
inf_norm = torch.linalg.norm(vector, ord=float('inf'))
print("Infinity Norm:", inf_norm.item()) # Expected: max(3, 4, 12) = 12.0
# 4. L0 Norm (Count of non-zero elements)
l0_norm = torch.linalg.norm(vector, ord=0)
print("L0 Norm:", l0_norm.item()) # Expected: 3.0
```
---
### Example 2: Matrix Norms (Frobenius, Row/Column Sums)
```python
# 1. Frobenius Norm (Default for matrices)
fro_norm = torch.linalg.norm(matrix)
print("Frobenius Norm:", fro_norm.item()) # Expected: sqrt(1^2 + 2^2 + 3^2 + (-4)^2 + 5^2 + (-6)^2)
# 2. L1 Matrix Norm (Maximum absolute column sum)
# Columns sums: |1| + |-4| = 5; |2| + |5| = 7; |3| + |-6| = 9. Max is 9.
matrix_l1 = torch.linalg.norm(matrix, ord=1)
print("Matrix L1 Norm (Max Column Sum):", matrix_l1.item()) # Expected: 9.0
# 3. Infinity Matrix Norm (Maximum absolute row sum)
# Row sums: |1| + |2| + |3| = 6; |-4| + |5| + |-6| = 15. Max is 15.
matrix_inf = torch.linalg.norm(matrix, ord=float('inf'))
print("Matrix Inf Norm (Max Row Sum):", matrix_inf.item()) # Expected: 15.0
```
---
### Example 3: Norms Along Specific Dimensions
In deep learning, you often need to normalize features or calculate norms across specific batch dimensions.
```python
# Create a batch of 2 samples, each with 3 features
batch_data = torch.tensor([[3.0, 4.0, 0.0],
[1.0, 2.0, 2.0]])
# Calculate L2 norm along the feature dimension (dim=1)
features_l2 = torch.linalg.norm(batch_data, ord=2, dim=1)
print("L2 Norm per sample:", features_l2)
# Expected: [sqrt(3^2 + 4^2), sqrt(1^2 + 2^2 + 2^2)] -> [5.0, 3.0]
# Keep dimensions intact (useful for broadcasting operations like division)
normalized_keepdim = torch.linalg.norm(batch_data, ord=2, dim=1, keepdim=True)
print("L2 Norm with keepdim=True:\n", normalized_keepdim)
# Output shape will be (2, 1) instead of (2,)
```
---
## 4. Key Considerations and Best Practices
### 1. Transitioning from `torch.norm` to `torch.linalg.norm`
If you are maintaining legacy code, you might encounter `torch.norm(x)`. Be aware of the following differences:
* `torch.norm` is deprecated and may be removed in future PyTorch releases.
* `torch.linalg.norm` enforces stricter mathematical rules. For instance, calculating a matrix norm requires specifying a 2-tuple for `dim` (e.g., `dim=(0, 1)`), whereas legacy `torch.norm` would implicitly flatten the matrix if no dimension was provided.
### 2. Numerical Stability and Gradients
When calculating the L2 norm during backpropagation, the derivative of $\sqrt{x}$ at $x=0$ is undefined (division by zero). If your tensor contains all zeros, calculating its L2 norm can result in `NaN` gradients.
* **Solution:** Add a small epsilon value inside the square root if manually calculating, or use `torch.clamp` to prevent values from reaching absolute zero before norm calculations if you experience gradient instability.
### 3. Performance on GPU
Both `torch.norm` and `torch.linalg.norm` are fully optimized for CUDA. When working with large datasets, ensure your tensors are on the GPU (`.to('cuda')`) to leverage parallelized hardware acceleration for these operations.
YouTip