ATSS

ATSS: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

  1. 动机

    • anchor-based和anchor-free方法的本质区别是对正负样本的定义,这也直接导致了performance gap
    • we propose ATSS
      • adaptive training sample selection
      • automatically select positive and negative samples according to statistical characteristics of objects
      • anchor-based&anchor-free模型上都涨点
    • discuss tiling multiple anchors
  2. 论点

    • 主流anchor-based方法
      • one-stage/two-stage
      • tile a large number of preset anchors on the image
      • output these refined anchors as detection results
    • anchor-free detectors主要分成两种
      • key-point based:预测角点/轮廓点/heatmap,然后bound轮廓得到框
      • center-based:预测中心点,然后基于中心点回归4个距离
      • 消除pre-defined anchors的hyper-params:强化generalization ability
    • 举例对比RetinaNet&FCOS
      • RetinaNet:one-stage anchor-based
      • FCOS:center-based anchor-free
      • 主要区别1:anchor数量,RetinaNet是hxwx9,FCOS是hxwx1
      • 主要区别2:正样本定义,RetinaNet是与gt box的IOU大于一定阈值的anchor,FCOS是featuremap上所有落进框内的格子点
      • 主要区别3:回归方式,RetinaNet是回归gt相对pos anchor的相对偏移量,FCOS是预测四条边相对中心点的绝对距离
  3. Difference Analysis of Anchor-based and Anchor-free Detection

    • we focus on the last two differences:正负样本定义 & 回归starting status

    • 设定RetinaNet也是one square anchor per location,和FCOS保持一致

    • experiment setting

      • MS COCO:80类前景,common split
      • ImageNet pretrained ResNet-50
      • resize input
      • SGD,90K iterations,0.9 momentum,1e-4 weight decay,16 batch size,0.01 lr with 0.1 lr decay/60K
      • testing:
        • 0.05 score to filter out bg boxes
        • output top 1000 detections per feature pyramid
        • 0.6 IoU thresh per class NMS to give final top 100 detections per image
    • inconsistency removal

      • 五大improvements加在FCOS上进一步boost the gap
      • 我们将其逐步加在RetinaNet上,能够拉到37%,和FCOS还有0.8个点的差距

    • 分析essential difference

      • 训练一个检测模型,首先要分出正负样本,然后用正样本来回归

      • Classification

        • RetinaNet用anchor boxes与gt box的IoU决定正负样本:best match anchor和大于一定IoU thresh的anchor是正样本,小于一定IoU thresh的anchor是负样本,其他的是ignore样本
        • FCOS用spatial and scale constraints选择正负样本:gt box以内的所有像素作为候选正样本,然后去掉部分尺度不匹配的候选样本,正样本以外都是负样本,没有ignore

        • 两个模型在两种样本选择策略上实验:Spatial and Scale Constraint相比较于IoU都会显著提点

        • 当两种方法都使用Spatial and Scale Constraint策略选择正负样本,模型精度就没啥差别了

      • Regression

        • RetinaNet regresses from the anchor box with 4 offsets:回归gt box相对于anchor box的偏移量,regression starting status是个box
        • FCOS regresses from the anchor point with 4 distances:回归gt box四条边相对于anchor center的距离,regression starting status是个point

        • 上面那个表说明了选择同样的正负样本,regression starting status就是个无关项,不影响精度

  4. Adaptive Training Sample Selection (ATSS)

    • 影响检测模型精度的essential difference在于how to define positive and negative training samples

    • previous strategies都有sensitive hyperparameters(anchors/scale),some outer objects may be neglected

    • we propose ATSS

      • almost no hyper
      • divides pos/neg samples according to data statistical characteristics

      • 对每个gt box,首先在每个level上,基于L2 center distance,找到k-closest anchor——k*L个candidates per gt box

      • 计算每个candidates的mean & var
      • 基于mean & var 计算这个gt box的IoU thresh
      • 在candidates里面选取大于等于IoU thresh,同时anchor center在gt box内的,留作正样本
      • 如果一个acnhor box匹配了多个gt box,选择IoU大的那个作为标签
    • 基于center distance选择anchor box:因为越靠近目标中心,越容易produce高品质框

    • 用mean+var作为IoU thresh:

      • higher mean indicates high-quality candidates,对应的IoU thresh应该高一点
      • higher variation indicates level specific,mean+var作为thresh能将candidates里面IoU较高的筛选出来
    • limit the anchor center in object:anchor中心不在目标框内显然不是个好框,用于筛掉前两步里的漏网之鱼,双保险

    • fair between different objects

      • 统计下来每类目标都有差不多0.2kL个正样本,与尺度无关
      • 但是RetinaNet和FCOS都是大目标正样本多,小目标正样本少
    • hyperparam-free:只有一个k,【还有anchor-setting呢???】

    • verification

      • lite version:被FCOS官方引用并称作center sampling,scale limit still exists in this version
      • full version:本文版本
      • 两个方法选择candidates的方法完全一致,就是select final postives的方法不同

    • hyperparam的鲁棒性

      • k在一定范围内(7-17)相对insensitive,太多了低质量框太多,太少了less statistical

      • 尝试不同的fix-ratio anchor scale和fix-scale anchor ratio,发现精度相对稳定,说明robust to anchor settings

      • multi-anchors settings

        • RetinaNet在不同的anchor setting下,精度基本不变,说明主要正样本选的好,不管一个location绑定几个anchor结果都一样