Pytorch - train

完整的训练过程

训练

准备数据集
- 下载 CIFAR10 的训练集和测试集
- DataLoader 加载数据集
创建网络 (使用 CIFAR-10 模型网络)
创建损失函数 —— 交叉熵
创建优化器
- 设置学习率
创建 tensorboard 记录训练过程
实际过程
- 训练
  - 设置训练轮数
  - 每次训练通过计算训练集实际输出与目标之间的损失函数 loss_fn(outputs, targets)
  - 通过优化器反向传播来优化梯度，使得损失函数得以降低
- 测试
  - 利用测试集来计算测试集实际输出与目标之间的损失函数 loss_fn(outputs, targets)
  - 计算正确率 accuracy = (outputs.argmax(1) == targets).sum()
- 保存每轮的模型

从图中曲线可以看出测试集正确率不断上述，损失函数不断下降。

使用GPU进行训练

只有网络模型, 数据(输入、标注), 损失函数 可以使用cuda()方法

# 网络模型
if torch.cuda.is_available():
    tudui = tudui.cuda()

# 损失函数
loss_fn = nn.CrossEntropyLoss()
if torch.cuda.is_available():
    loss_fn = loss_fn.cuda()

# 数据
if torch.cuda.is_available():
    imgs = imgs.cuda()
    targets = targets.cuda()

且只有数据需要将对应返回值赋值给原值

tudui = tudui.cuda()
loss_fn.cuda()


if torch.cuda.is_available():
    imgs = imgs.cuda()
    targets = targets.cuda()

此外可以通过定义设备的方式，使用GPU或CPU

# 使用第一个GPU
device = torch.device("cuda")
device = torch.device("cuda:0")
# 使用第二个GPU
device = torch.device("cuda:1")

# 动态获取设备类型
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")