目标检测 - VOC数据集

PASCAL VOC（Visual Object Classes）数据集 是计算机视觉领域的经典基准数据集，广泛用于目标检测、语义分割、图像分类等任务。以下是关于VOC数据集的详细解析：

基本信息

发布时间：2005年至2012年（每年更新，常用版本为VOC2007和VOC2012）。
类别数量：20个常见物体类别（如人、车、动物、家具等）。
主要任务：目标检测、语义分割、图像分类、动作识别等。
数据规模：
- VOC2007：9,963张图像，24,640个标注对象。
- VOC2012：11,540张图像，27,450个标注对象。

数据内容

图像与标注

图像：日常场景的RGB图片，分辨率约500×375像素。
标注类型：
- 目标检测：边界框（Bounding Box）+ 类别标签。
- 语义分割：像素级掩膜（PNG格式）。
- 图像分类：图像级标签（是否包含某类物体）。
标注文件格式：XML文件（目标检测）、PNG掩膜（语义分割）。

20个物体类别

人（person）、鸟（bird）、猫（cat）、牛（cow）、狗（dog）、马（horse）、羊（sheep）、
飞机（aeroplane）、自行车（bicycle）、船（boat）、巴士（bus）、汽车（car）、摩托车（motorbike）、火车（train）、
瓶子（bottle）、椅子（chair）、餐桌（dining table）、盆栽（potted plant）、沙发（sofa）、电视（tv/monitor）

数据集结构

典型的VOC数据集目录结构如下：

VOCdevkit/
└── VOC20XX/               # 年份（如VOC2007、VOC2012）
    ├── Annotations/       # 目标检测的XML标注文件
    ├── ImageSets/         # 划分文件（训练集、验证集、测试集）
    │   ├── Main/          # 分类任务的文件列表
    │   ├── Layout/        # 人体部位检测任务
    │   ├── Segmentation/  # 分割任务的文件列表
    ├── JPEGImages/        # 原始图像（.jpg）
    ├── SegmentationClass/ # 语义分割的类别掩膜（按类别着色）
    └── SegmentationObject/# 实例分割的掩膜（按个体着色）

主要任务与评估指标

目标检测（Object Detection）

目标：预测图像中物体的边界框和类别。
评估指标：mAP（Mean Average Precision）
- 计算每个类别的AP（平均精度），再取所有类别的平均值。
- IoU（交并比）阈值通常设为0.5（VOC标准）或0.5:0.95（COCO标准）。

语义分割（Semantic Segmentation）

目标：为每个像素分配类别标签。
评估指标：mIoU（Mean Intersection over Union）
- 计算每个类别的IoU（预测区域与真实区域的交集/并集），再取平均值。

图像分类（Image Classification）

目标：判断图像是否包含某类物体。
评估指标：分类准确率（Accuracy）。

如何使用VOC数据集

步骤1：下载数据集

官方镜像（推荐）：
- VOC2007
- VOC2012

步骤2：解析标注文件

目标检测XML示例：

Annotations:

详情

<annotation>
	<folder>VOC2007</folder>
	<filename>000032.jpg</filename>
	<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
		<flickrid>311023000</flickrid>
	</source>
	<owner>
		<flickrid>-hi-no-to-ri-mo-rt-al-</flickrid>
		<name>?</name>
	</owner>
	<size>
		<width>500</width>
		<height>281</height>
		<depth>3</depth>
	</size>
	<segmented>1</segmented>
	<object>
		<name>aeroplane</name>
		<pose>Frontal</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>104</xmin>
			<ymin>78</ymin>
			<xmax>375</xmax>
			<ymax>183</ymax>
		</bndbox>
	</object>
	<object>
		<name>aeroplane</name>
		<pose>Left</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>133</xmin>
			<ymin>88</ymin>
			<xmax>197</xmax>
			<ymax>123</ymax>
		</bndbox>
	</object>
	<object>
		<name>person</name>
		<pose>Rear</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>195</xmin>
			<ymin>180</ymin>
			<xmax>213</xmax>
			<ymax>229</ymax>
		</bndbox>
	</object>
	<object>
		<name>person</name>
		<pose>Rear</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>26</xmin>
			<ymin>189</ymin>
			<xmax>44</xmax>
			<ymax>238</ymax>
		</bndbox>
	</object>
</annotation>

图像:

原图：

语义分割的类别图：

语义分割的对象图：

步骤3：数据加载与预处理

使用PyTorch加载VOC数据集：

from torchvision.datasets import VOCDetection

# 目标检测数据集
dataset = VOCDetection(root='./data', year='2012', image_set='train', download=True)

实际操作

代码：

详情

import torchvision 
from PIL import ImageDraw

coco_dataset=torchvision.datasets.CocoDetection(
    root=r"./val2017",
    annFile="./src/instances_val2017.json")

image, info = coco_dataset[0]
image_handler=ImageDraw.ImageDraw(image) 

for annotation in info:
    x_min, y_min, width, height = annotation['bbox']
    image_handler.rectangle(((x_min, y_min), (x_min+width, y_min+height)))

image.show()

原图：

边界框图：