Pytorch Torchvision
torchvision is an extension library in the PyTorch ecosystem specifically designed for computer vision tasks, providing the following core functions:
1. **Pre-trained Models**: Includes implementations of classic CNN architectures (such as ResNet, VGG, AlexNet, etc.)
2. **Dataset Tools**: Built-in common vision datasets (such as CIFAR10, MNIST, ImageNet, etc.)
3. **Image Transforms**: Provides various image preprocessing and data augmentation methods
4. **Utility Tools**: Includes video processing, image manipulation and other auxiliary functions
# Install torchvision (usually installed together with PyTorch) pip install torch torchvision
* * *
## Core Components Analysis
### 1. torchvision.models
Provides pre-trained computer vision models, which can be directly used for transfer learning:
## Example
import torchvision.models as models
# Load pre-trained model
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
#### Common Model List:
| Model Name | Use Case | Parameters | Top-1 Accuracy |
| --- | --- | --- | --- |
| ResNet | General image classification | 11M-60M | 69%-80% |
| VGG | Feature extraction | 138M | 71.3% |
| MobileNet | Mobile applications | 3.4M | 70.6% |
| EfficientNet | Efficient models | 5M-66M | 77%-84% |
* * *
### 2. torchvision.datasets
Built-in common computer vision datasets, simplifying data loading process:
## Example
from torchvision import datasets
# Load CIFAR10 dataset
train_data = datasets.CIFAR10(
root='data',
train=True,
download=True,
transform=transforms.ToTensor()
)
# Load MNIST dataset
test_data = datasets.MNIST(
root='data',
train=False,
download=True
)
#### Supported Dataset Types:
## Example
graph TD
A[torchvision.datasets] --> B
A --> C
A --> D
B --> B1[CIFAR10/100]
B --> B2[MNIST/FashionMNIST]
B --> B3
C --> C1
C --> C2
D --> D1
* * *
### 3. torchvision.transforms
Core tools for image preprocessing and data augmentation:
## Example
from torchvision import transforms
# Define image transformation pipeline
transform = transforms.Compose([
transforms.Resize(256),# Resize
transforms.CenterCrop(224),# Center crop
transforms.ToTensor(),# Convert to tensor
transforms.Normalize(# Normalize
mean=[0.485,0.456,0.406],
std=[0.229,0.224,0.225]
)
])
#### Common Transform Methods Classification:
| Category | Method Example | Purpose |
| --- | --- | --- |
| Geometric transforms | RandomRotation, RandomResizedCrop | Increase position invariance |
| Color transforms | ColorJitter, Grayscale | Enhance color robustness |
| Blur/Noise | GaussianBlur, RandomErasing | Prevent overfitting |
| Composite transforms | RandomApply, RandomChoice | Flexible combination strategy |
* * *
## Practical Example: Image Classification Workflow
### 1. Data Preparation
## Example
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Define data transformation
train_transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ToTensor(),
transforms.Normalize((0.5,),(0.5,))
])
# Load dataset
train_set = datasets.CIFAR10(
root='./data',
train=True,
download=True,
transform=train_transform
)
# Create data loader
train_loader = DataLoader(
train_set,
batch_size=32,
shuffle=True
)
### 2. Model Training
## Example
import torch.nn as nn
import torch.optim as optim
# Use pre-trained model
model = models.resnet18(pretrained=True)
# Modify the last layer (adapt to CIFAR10's 10 classes)
num_ftrs = model.fc.in_features
model.fc
YouTip