YouTip LogoYouTip

Pytorch Batchnorm Dropout

Batch Normalization and Dropout are two of the most core regularization and stabilization techniques in deep neural network training. The former solves the internal covariate shift problem during training, making deeper networks trainable; the latter prevents overfitting by randomly dropping neurons, enhancing model generalization. They are typically used together and are standard components in modern neural networks. * * * ## 1. Batch Normalization ### 1.1 Basic Principles During training of deep networks, tiny parameter changes in the previous layer are continuously amplified as the network deepens, causing the input distribution of subsequent layers to constantly changeβ€”this phenomenon is called **Internal Covariate Shift**. Batch Normalization solves this problem by standardizing the output of each layer, forcing the activation values back to a stable distribution. **Normalization Formula:** $$ \left(\hat{x}\right)_{i} = \frac{x_{i} - \mu_{B}}{\sqrt{\sigma_{B}^{2} + \epsilon}} $$ $$ y_{i} = \gamma \left(\hat{x}\right)_{i} + \beta $$ Where: * $\mu_{B}$, $\sigma_{B}^{2}$ are the mean and variance of the current batch * $\epsilon$ is a small constant to prevent division by zero (default `1e-5`) * $\gamma$, $\beta$ are learnable scale and shift parameters, allowing the network to decide the final distribution shape **Benefits of Batch Normalization:** * Allows using larger learning rates, speeding up training * Reduces sensitivity to initialization, making training more stable * Has some regularization effect, reducing dependence on Dropout * Alleviates gradient vanishing/explosion, making deeper networks trainable * * * ### 1.2 BatchNorm1d / 2d / 3d PyTorch provides three versions based on input dimensions, with completely consistent usage, only
← Pytorch EmbeddingPytorch Lr Scheduler β†’