YouTip LogoYouTip

Pytorch Loss Function

Loss functions measure the gap between model predictions and ground truth values, serving as the core guide for neural network trainingβ€”optimizers update model parameters by minimizing the loss function. PyTorch includes over a dozen common loss functions in the `torch.nn` module, covering major task types such as classification, regression, and ranking. * * * ## 1. Loss Function Basics ### Basic Usage All PyTorch loss functions are subclasses of `nn.Module` and share a unified usage pattern: ## Instance import torch import torch.nn as nn # 1. Instantiate the loss function criterion = nn.CrossEntropyLoss() # 2. Compute the loss (predictions first, targets second) loss = criterion(predictions, targets) # 3. Backpropagation loss.backward() ### Input Shape Conventions Different loss functions have different input shape requirements, which is where beginners most often make mistakes: | Loss Function | Prediction (input) Shape | Label (target) Shape | | --- | --- | --- | | `CrossEntropyLoss` | `(N, C)` raw logits | `(N,)` integer class indices | | `BCELoss` | `(N,)` probabilities after Sigmoid | `(N,)` 0/1 floats | | `BCEWithLogitsLoss` | `(N,)` raw logits | `(N,)` 0/1 floats | | `MSELoss` | `(N,)` any real number | `(N,)` any real number | | `NLLLoss` | `(N, C)` probabilities after log_softmax | `(N,)` integer class indices | > **N** = batch size, **C** = number of classes * * * ## 2. Classification Task Loss Functions ### 2.1 CrossEntropyLoss The most commonly used multi-class classification loss function. **It automatically applies Softmax + Log + Negation internally**, so there is no need to manually apply Softmax to the model output. **Mathematical Formula:** Loss = -sum(y_c * log(p_c)) Where p_c = exp(x_c) / sum_j exp(x_j) is the Softmax output. ## Instance import torch import torch.nn as nn criterion = nn.CrossEntropyLoss() # Model output: raw logits, shape (batch_size, num_classes) # No need to apply Softmax beforehand! predictions = torch.tensor([ [2.0,0.5,0.3],# Sample 1, most likely class 0 [0.1,3.0,0.2],# Sample 2, most likely class 1 [0.2,0.1,4.0],# Sample 3, most likely class 2 ]) # Labels: integer class indices, shape (batch_size,) targets = torch.tensor([0,1,2]) loss = criterion(predictions, targets) print(f"Loss: {loss.item():.4f}")# Loss: 0.1763 **Supports soft labels (Label Smoothing):** ## Instance # Label smoothing, mitigates overfitting, commonly used in image classification competitions criterion = nn.CrossEntropyLoss(label_smoothing=0.1) # Also supports directly passing soft labels (probability distributions) soft_targets = torch.tensor([ [0.9,0.05,0.05], [0.05,0.9,0.05], ]) predictions = torch.randn(2,3) loss = criterion(predictions, soft_targets) > **Applicable Scenarios:** Multi-classification (cat/dog/bird), image classification, text classification, and all other multi-classification tasks. * * * ### 2.2 BCELoss Binary Cross-Entropy Loss Specifically for **binary classification** or **multi-label classification** tasks. The input must be probability values (0~1) processed through `Sigmoid`. **Mathematical Formula:** Loss = -[y * log(p) + (1-y) * log(1-p)] ## Instance criterion = nn.BCELoss() # Model output must be passed through Sigmoid first, value range (0, 1) raw_output = torch.tensor([2.0, -1.0,0.5, -3.0]) predictions = torch.sigmoid(raw_output)# [0.88, 0.27, 0.62, 0.05] # Labels: float type 0.0 or 1.0 targets = torch.tensor([1.0,0.0,1.0,0.0]) loss = criterion(predictions, targets) print(f"Loss: {loss.item():.4f}")# Loss: 0.2824 # Multi-label classification (each sample can belong to multiple classes) # predictions shape: (batch_size, num_labels) predictions_ml = torch.sigmoid(torch.randn(4,5)) targets_ml = torch.randint(0,2,(4,5)).float() loss_ml = criterion(predictions_ml, targets_ml) > `BCELoss` requires the input to be in the (0, 1) range; passing raw logits will lead to numerical instability or even NaN. It is recommended to use `BCEWithLogitsLoss` below. * * * ### 2.3 BCEWithLogitsLoss An improved version of `BCELoss`. **It automatically applies Sigmoid internally**, is more numerically stable, and is recommended as the priority choice. ## Instance criterion = nn.BCEWithLogitsLoss() # Pass raw logits directly, no need to manually apply Sigmoid predictions = torch.tensor([2.0, -1.0,0.5, -3.0]) targets = torch.tensor([1.0,0.0,1.0,0.0]) loss = criterion(predictions, targets) print(f"Loss: {loss.item():.4f}") # Equivalent to (but with better numerical stability): # loss = BCELoss(Sigmoid(predictions), targets) **With positive sample weights (handling class imbalance):** ## Instance # pos_weight: positive sample weight, the larger the value, the more attention is paid to positive samples # For example, if negative samples are 10 times the positive samples, set pos_weight=10 pos_weight = torch.tensor([10.0]) criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight) > **Applicable Scenarios:** Binary classification (spam detection), multi-label classification (article multi-label tagging), object detection (foreground/background judgment). * * * ### 2.4 NLLLoss Negative Log-Likelihood Loss Requires manually applying `log_softmax` to the model output, offering greater flexibility. `CrossEntropyLoss = LogSoftmax + NLLLoss`. ## Instance criterion = nn.NLLLoss() # Must manually apply log_softmax first raw_output = torch.randn(4,3)# (batch, num_classes) log_probs = torch.log_softmax(raw_output, dim=1) targets = torch.tensor([0,2,1,0]) loss = criterion(log_probs, targets) > **Use Cases:** When you need to use log probabilities in intermediate steps (e.g., CTC, Beam Search); for other cases, prioritize `CrossEntropyLoss`. * * * ## 3. Regression Task Loss Functions ### 3.1 MSELoss Mean Squared Error The most classic regression loss, **highly sensitive to large errors** (because squaring amplifies the impact of large errors). **Mathematical Formula:** MSELoss = (1/N) * sum((y_i - y_hat_i)^2) ## Instance criterion = nn.MSELoss() predictions = torch.tensor([2.5,0.5,2.0,8.0]) targets = torch.tensor([3.0, -0.5,2.0,7.0]) loss = criterion(predictions, targets) print(f"MSE Loss: {loss.item():.4f}")# MSE Loss: 0.3750 # Manual verification manual =((predictions - targets) ** 2).mean() print(f"Manual calculation: {manual.item():.4f}")# 0.3750 > **Applicable Scenarios:** Continuous value regression like house price prediction, temperature prediction, etc. Works well when there are no obvious outliers in the data. * * * ### 3.2 L1Loss Mean Absolute Error **More robust to outliers**, because it takes the absolute value instead of squaring, so large errors are not overly amplified. **Mathematical Formula:** L1Loss = (1/N) * sum(|y_i - y_hat_i|) ## Instance criterion = nn.L1Loss() predictions = torch.tensor([2.5,0.5,2.0,8.0]) targets = torch.tensor([3.0, -0.5,2.0,7.0]) loss = criterion(predictions, targets) print(f"L1 Loss: {loss.item():.4f}")# L1 Loss: 0.5000 * * * ### 3.3 SmoothL1Loss Huber Loss **Combines the advantages of MSE and L1**: uses MSE for small errors (smooth, stable gradients) and L1 for large errors (robust to outliers). The standard loss in object detection (Faster R-CNN). **Mathematical Formula:** SmoothL1(x) = 0.5*x^2 if |x| < 1, else |x| - 0.5 ## Instance criterion = nn.SmoothL1Loss() predictions = torch.tensor([2.5,0.5,2.0,8.0]) targets = torch.tensor([3.0, -0.5,2.
← Pytorch Transfer LearningPytorch Torch Nn Transformeren β†’