keywords:semi-supervised, curriculum, pseudo labels,
End-to-End Semi-Supervised Object Detection with Soft Teacher
动机
end-to-end training:相比较于其他方法的multi-stage
semi-supervised:用外部unlabeled数据,以及pseudo-label based approach
propose two techniques
- soft teacher mechanism:pseudo样本的classification loss用teacher model的prediction score来加权
- box jittering mechanism:挑选reliable pseudo boxes
verified
- use SWIN-L as baseline
- metric on COCO:60.4 mAP
if pretrained with Object365:61.3 mAP
论点
- we present this end-to-end pseudo-label based semi-supervised object detection framework
- simultaneously performs
- pseudo-labeling:teacher
- training detector use the current pseudo-labels & a few training sample:student
- teacher is an exponential moving average (EMA) of the student model
- mutually enforce each other
- soft teacher approach
- teacher model的作用是给student model生成的box candidates打分,
- 高于一定阈值的为前景,但是可能有部分前景被归类为背景,所以用这个score作为reliability measure,给标记为背景框的cls loss进行加权
- reliability measure
- simultaneously performs
- we present this end-to-end pseudo-label based semi-supervised object detection framework
方法
overview
- 两个model:student和teacher
- teacher model用来生成pseudo labels:two set of pseudo boxes,一个用于class branch,一个用于regression branch
- student model用supervised&unsupervised sample的loss来更新
- teacher model用student model的EMA来更新
- two crucial designs
- soft teacher
- box jittering
- 整体的工作流程就是,每个training iteration,先按照一定比例抽取labeled&unlabeled sample构成data batch,然后用teacher model生成unlabeled data的pseudo label(thousands of box candidates+NMS+score filter),然后将其作为unlabeled sample的ground truth,训练student model,overall loss是supervised loss和unsupervised loss的加权和
- 在训练开始阶段,两个模型都是随机初始化的,teacher模型随着student模型的更新而更新
- FixMatch:
- 输入给teacher模型的样本使用weak aug
- 输入给student模型的样本使用strong aug
soft teacher
detector的pseudo-label质量很重要
所以用score thresh=0.9去定义box candidates的前/背景
但是这时候如果用传统的IoU来定义student model的box candidates的pos/neg,会有一部分前景框被当作背景
to alleviate
- assess the reliability of each student-generated box candidate to be a real background
- given a student-generated box candidate,用teacher model的detection head去预测这个框的background score
overall unsupervised cls loss
- $G_{cls}$是the set of boxes teacher generated for classification,就是teacher model预测的top1000经过nms和score filter之后的boxes
- $b_i^{fg}$是student candidates中被assign为前景的框,$b_i^{bg}$是student candidates中被assign为背景的框,assign的原则就是score>0.9
- $w_j$是对assign为背景的框的加权
- $r_k$是reliability score,是student model通过hard score thresh assign为背景的框,用teacher model的detection head去预测的bg score
box jittering
fg score thresh和box iou并不呈现strong positive correlation,说明基于这个原则产生的框pseudo-labels并不一定适合box regression
localization reliability:
- 衡量一个pseudo box的consistency
- given a pseudo box,sample一系列jitter box around it,再用teacher model去预测这些jitter box得到refined boxes
- refined box和pseudo box的variance越小,说明这个框的localization reliability越高
- $\hat b_i$是refined boxes
- $\sigma_k$是refine boxes的四个坐标基于原始box的标准差
- $\hat \sigma_k$是上面那个标准差基于原始box的尺度进行归一化
- $\overline\sigma$是refine boxes四个坐标的normed std的平均值
- 只计算teacher box candidates里面,fg score>0.5的那部分
overall unsupervised reg loss
- $b_i^{fg}$是student candidates中被assign为前景的框,即cls score>0.9那些预测框
- $G_{cls}$是the set of boxes teacher generated for regression,就是jittered reliability大于一定阈值的candidates
overall unsupervised loss:cls loss和reg loss之和,然后用样本数进行norm
实验