PyTorch torch.nn Reference Manual\n\n* * *\n\n`torch.nn.Dropout` is a module used for regularization in PyTorch.\n\nIt reduces co-adaptation between neurons by randomly zeroing input elements, thereby preventing overfitting.\n\n### Function Definition\n\ntorch.nn.Dropout(p=0.5, inplace=False)\n**Parameter Description:**\n\n* `p` (float): The probability of an element to be zeroed. Default is 0.5.\n* `inplace` (bool): Whether to perform the operation in-place. Default is False.\n\n* * *\n\n## Usage Examples\n\n### Example 1: Basic Usage\n\nCreate and use a Dropout layer:\n\n## Instance\n\nimport torch\n\nimport torch.nn as nn\n\n# Create Dropout layer, drop probability 0.5\n\n dropout = nn.Dropout(p=0.5)\n\n# Training mode (Dropout active)\n\n dropout.train()\n\n# Create input\n\n input_tensor = torch.ones(1,10)\n\nprint("Input:", input_tensor.squeeze().tolist())\n\n# Forward propagation multiple times to observe randomness\n\nfor i in range(3):\n\n output = dropout(input_tensor)\n\nprint(f"Output {i+1}:", output.squeeze().tolist())\n\nAs you can see, approximately half of the elements are randomly zeroed out each time it is called.\n\n### Example 2: Training vs Evaluation Mode\n\nDropout behaves differently during training and evaluation:\n\n## Instance\n\nimport torch\n\nimport torch.nn as nn\n\ndropout = nn.Dropout(p=0.5)\n\n# Training mode\n\n dropout.train()\n\n train_output = dropout(torch.ones(4,10))\n\nprint("Training mode - Activation ratio:",(train_output !=0).float().mean().item())\n\n# Evaluation mode\n\n dropout.eval()\n\n eval_output = dropout(torch.ones(4,10))\n\nprint("Evaluation mode - Activation ratio:",(eval_output !=0).float().mean().item())\n\nprint("Evaluation mode output:", eval_output.tolist())\n\nDuring evaluation, Dropout is inactive, and the output remains unchanged.\n\n### Example 3: Using in a Neural Network\n\nA typical fully connected network with Dropout:\n\n## Instance\n\nimport torch\n\nimport torch.nn as nn\n\nclass DropoutNet(nn.Module):\n\ndef __init__ (self, input_dim=784, hidden_dim=256, output_dim=10, dropout_rate=0.5):\n\nsuper(DropoutNet,self). __init__ ()\n\nself.fc1= nn.Linear(input_dim, hidden_dim)\n\nself.dropout1= nn.Dropout(p=dropout_rate)\n\nself.fc2= nn.Linear(hidden_dim, hidden_dim)\n\nself.dropout2= nn.Dropout(p=dropout_rate)\n\nself.fc3= nn.Linear(hidden_dim, output_dim)\n\nself.relu= nn.ReLU()\n\ndef forward(self, x):\n\n x =self.relu(self.fc1(x))\n\n x =self.dropout1(x)# First Dropout\n\n x =self.relu(self.fc2(x))\n\n x =self.dropout2(x)# Second Dropout\n\n x =self.fc3(x)\n\nreturn x\n\nmodel = DropoutNet()\n\n# Training mode\n\n model.train()\n\n input_data = torch.randn(32,784)\n\n output = model(input_data)\n\nprint("Training mode output shape:", output.shape)\n\n# Evaluation mode\n\n model.eval()\n\n output = model(input_data)\n\nprint("Evaluation mode output shape:", output.shape)\n\n### Example 4: Using Dropout2d in a CNN\n\n`nn.Dropout2d` drops entire feature maps by channel:\n\n## Instance\n\nimport torch\n\nimport torch.nn as nn\n\n# Dropout2d drops by channel\n\n dropout2d = nn.Dropout2d(p=0.5)\n\n# Input: batch=1, channels=4, height=4, width=4\n\n input_tensor = torch.ones(1,4,4,4)\n\n dropout2d.train()\n\noutput = dropout2d(input_tensor)\n\nprint("Dropout2d output shape:", output.shape)\n\nprint("Non-zero channels:",(output.sum(dim=(2,3))!=0).sum().item())\n\n### Example 5: Effects of Different Drop Rates\n\nThe impact of drop rates on the network:\n\n## Instance\n\nimport torch\n\nimport torch.nn as nn\n\nfor p in[0.1,0.3,0.5,0.7]:\n\n dropout = nn.Dropout(p=p)\n\n dropout.train()\n\n# Run multiple times and take the average\n\n total_active =0\n\nfor _ in range(100):\n\n output = dropout(torch.ones(1000))\n\n total_active +=(output !=0).float().sum().item()\n\navg_active = total_active / 100 / 1000\n\nprint(f"p={p} - Average activation ratio: {avg_active:.2%} (Expected: {1-p:.2%})")\n\n* * *\n\n## Dropout Type Comparison\n\n| **Type** | **Drop Method** | **Applicable Scenarios** |\n| --- | --- | --- |\n| `nn.Dropout` | Randomly zero individual elements | Fully connected layers, feature vectors |\n| `nn.Dropout2d` | Randomly zero entire channels | Convolutional layer feature maps |\n| `nn.Dropout3d` | Randomly zero entire 3D channels | 3D convolution features |\n\n* * *\n\n## Common Questions\n\n### Q1: How to choose the Dropout drop rate?\n\n* 0.1-0.3: Lighter regularization, suitable for large datasets\n* 0.4-0.5: Common default values\n* 0.5+: Stronger regularization, suitable for small datasets\n\n### Q2: Where should Dropout be placed?\n\nUsually placed after fully connected layers and after activation functions. It can also be placed before activation functions.\n\n### Q3: Does Dropout need to be turned off during evaluation?\n\nYes, using `model.eval()` during evaluation automatically turns off Dropout.\n\n* * *\n\n## Use Cases\n\nThe main application scenarios for `nn.Dropout` include:\n\n* **Preventing overfitting**: Reduces dependencies between neurons\n* **Model ensemble**: Approximates the effect of multiple networks\n* **Fully connected layers**: Most commonly used in FC layers\n* **Feature dropping**: Improves model robustness\n\n> Note: Dropout is enabled during training. You must switch to eval mode during evaluation, otherwise the output will be unstable.\n\n* * *\n\n
PyTorch torch.nn Reference Manual