Pytorch Torch Norm

# Understanding PyTorch's Vector and Matrix Norms: `torch.norm` and `torch.linalg.norm` In deep learning and scientific computing, calculating the norm of a vector or matrix is a fundamental operation. Norms are widely used in regularization (like L1/L2 regularization), gradient clipping, calculating loss functions (such as Mean Absolute Error or Mean Squared Error), and evaluating model similarity. In PyTorch, calculating norms has evolved. While the legacy `torch.norm` function is still widely seen in older codebases, PyTorch has transitioned to the more robust and standard-compliant `torch.linalg.norm` module. This comprehensive guide covers the syntax, usage, and practical examples of calculating norms in PyTorch, bridging the gap between legacy implementations and modern best practices. --- ## 1. The Evolution of Norms in PyTorch Before diving into the syntax, it is important to understand the current state of PyTorch's API: * **`torch.norm` (Legacy):** This function is deprecated in newer versions of PyTorch. While it still works for backward compatibility, it can lead to ambiguous behavior when distinguishing between vector and matrix norms. * **`torch.linalg.norm` (Recommended):** Introduced in PyTorch 1.7.0, this function is consistent with NumPy’s `np.linalg.norm` and clearly separates vector and matrix operations. *Note: This tutorial focuses on the modern `torch.linalg.norm` while providing context for `torch.norm` where necessary.* --- ## 2. Syntax and Parameters ### `torch.linalg.norm` ```python torch.linalg.norm(A, ord=None, dim=None, keepdim=False, *, out=None) ``` #### Parameter Breakdown: | Parameter | Type | Description | | :--- | :--- | :--- | | `A` | `Tensor` | The input tensor. | | `ord` | `int`, `float`, `inf`, `-inf`, `'fro'`, `'nuc'` | The order of the norm. Controls the type of norm calculated (see table below). Default is `None`. | | `dim` | `int`, `Tuple` | The dimension(s) along which to calculate the norm. If `None`, `A` is flattened to a 1D vector (unless calculating matrix norms). | | `keepdim` | `bool` | If `True`, the reduced dimensions are retained with length 1. Default is `False`. | | `out` | `Tensor` | Optional output tensor to write the result into. | --- ### Supported Norm Types (`ord` values) The behavior of `ord` depends on whether you are calculating a **Vector Norm** (when `dim` is an integer or `None` on a 1D tensor) or a **Matrix Norm** (when `dim` is a 2-tuple of integers). #### For Vectors (1D Tensors or flattened tensors): | `ord` | Norm Type | Mathematical Formula | Description | | :--- | :--- | :--- | :--- | | `None` (Default) | L2 Norm | $\sqrt{\sum |x_i|^2}$ | Euclidean distance / standard L2 norm. | | `2` | L2 Norm | $\sqrt{\sum |x_i|^2}$ | Same as `None`. | | `1` | L1 Norm | $\sum |x_i|$ | Manhattan distance / sum of absolute values. | | `float('inf')` | Infinity Norm | $\max(|x_i|)$ | Maximum absolute value. | | `float('-inf')` | -Infinity Norm | $\min(|x_i|)$ | Minimum absolute value. | | `0` | L0 "Norm" | $\sum (x_i \neq 0)$ | Number of non-zero elements. | | Any `int` or `float` | $p$-Norm | $(\sum |x_i|^p)^{1/p}$ | General Minkowski distance. | #### For Matrices (2D Tensors or 2D slices of multi-dimensional tensors): | `ord` | Norm Type | Description | | :--- | :--- | :--- | | `None` (Default) | Frobenius Norm | Equivalent to flattening the matrix and taking the L2 norm. | | `'fro'` | Frobenius Norm | Same as `None`. | | `'nuc'` | Nuclear Norm | Sum of the singular values (used in low-rank matrix approximation). | | `1` | Max Column Sum | $\max_j \sum_i |A_{ij}|$ | | `2` | Spectral Norm | Largest singular value. | | `float('inf')` | Max Row Sum | $\max_i \sum_j |A_{ij}|$ | --- ## 3. Code Examples Let's look at practical implementations of these norms in PyTorch. ### Setup ```python import torch # Create a sample 1D vector and a 2D matrix vector = torch.tensor([3.0, -4.0, 12.0]) matrix = torch.tensor([[1.0, 2.0, 3.0], [-4.0, 5.0, -6.0]]) print("Vector:", vector) print("Matrix:\n", matrix) ``` --- ### Example 1: Vector Norms (L1, L2, and Infinity Norms) ```python # 1. L2 Norm (Default) l2_norm = torch.linalg.norm(vector) print("L2 Norm (Default):", l2_norm.item()) # Expected: sqrt(3^2 + (-4)^2 + 12^2) = 13.0 # 2. L1 Norm (Manhattan Norm) l1_norm = torch.linalg.norm(vector, ord=1) print("L1 Norm:", l1_norm.item()) # Expected: 3 + 4 + 12 = 19.0 # 3. Infinity Norm (Max Absolute Value) inf_norm = torch.linalg.norm(vector, ord=float('inf')) print("Infinity Norm:", inf_norm.item()) # Expected: max(3, 4, 12) = 12.0 # 4. L0 Norm (Count of non-zero elements) l0_norm = torch.linalg.norm(vector, ord=0) print("L0 Norm:", l0_norm.item()) # Expected: 3.0 ``` --- ### Example 2: Matrix Norms (Frobenius, Row/Column Sums) ```python # 1. Frobenius Norm (Default for matrices) fro_norm = torch.linalg.norm(matrix) print("Frobenius Norm:", fro_norm.item()) # Expected: sqrt(1^2 + 2^2 + 3^2 + (-4)^2 + 5^2 + (-6)^2) # 2. L1 Matrix Norm (Maximum absolute column sum) # Columns sums: |1| + |-4| = 5; |2| + |5| = 7; |3| + |-6| = 9. Max is 9. matrix_l1 = torch.linalg.norm(matrix, ord=1) print("Matrix L1 Norm (Max Column Sum):", matrix_l1.item()) # Expected: 9.0 # 3. Infinity Matrix Norm (Maximum absolute row sum) # Row sums: |1| + |2| + |3| = 6; |-4| + |5| + |-6| = 15. Max is 15. matrix_inf = torch.linalg.norm(matrix, ord=float('inf')) print("Matrix Inf Norm (Max Row Sum):", matrix_inf.item()) # Expected: 15.0 ``` --- ### Example 3: Norms Along Specific Dimensions In deep learning, you often need to normalize features or calculate norms across specific batch dimensions. ```python # Create a batch of 2 samples, each with 3 features batch_data = torch.tensor([[3.0, 4.0, 0.0], [1.0, 2.0, 2.0]]) # Calculate L2 norm along the feature dimension (dim=1) features_l2 = torch.linalg.norm(batch_data, ord=2, dim=1) print("L2 Norm per sample:", features_l2) # Expected: [sqrt(3^2 + 4^2), sqrt(1^2 + 2^2 + 2^2)] -> [5.0, 3.0] # Keep dimensions intact (useful for broadcasting operations like division) normalized_keepdim = torch.linalg.norm(batch_data, ord=2, dim=1, keepdim=True) print("L2 Norm with keepdim=True:\n", normalized_keepdim) # Output shape will be (2, 1) instead of (2,) ``` --- ## 4. Key Considerations and Best Practices ### 1. Transitioning from `torch.norm` to `torch.linalg.norm` If you are maintaining legacy code, you might encounter `torch.norm(x)`. Be aware of the following differences: * `torch.norm` is deprecated and may be removed in future PyTorch releases. * `torch.linalg.norm` enforces stricter mathematical rules. For instance, calculating a matrix norm requires specifying a 2-tuple for `dim` (e.g., `dim=(0, 1)`), whereas legacy `torch.norm` would implicitly flatten the matrix if no dimension was provided. ### 2. Numerical Stability and Gradients When calculating the L2 norm during backpropagation, the derivative of $\sqrt{x}$ at $x=0$ is undefined (division by zero). If your tensor contains all zeros, calculating its L2 norm can result in `NaN` gradients. * **Solution:** Add a small epsilon value inside the square root if manually calculating, or use `torch.clamp` to prevent values from reaching absolute zero before norm calculations if you experience gradient instability. ### 3. Performance on GPU Both `torch.norm` and `torch.linalg.norm` are fully optimized for CUDA. When working with large datasets, ensure your tensors are on the GPU (`.to('cuda')`) to leverage parallelized hardware acceleration for these operations.

YouTip

Pytorch Torch Norm

📂 Categories