spineNet

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

动机
- object detection task
  - requiring simultaneous recognition and localization
  - solely encoder performs not well
  - while encoder-decoder architectures are ineffective
- propose SpineNet
  - scale-permuted intermediate features
  - cross-scale connections
  - searched by NAS on detection COCO
  - can transfer to classification tasks
  - 在轻量和重量back的一阶段网络中都涨点领先
论点
- scale-decreasing backbone
  - throws away the spatial information by down-sampling
  - challenging to recover
  - 接一个轻量的FPN：
- scale-permuted model
  - scales of features can increase/decrease anytime：retain the spacial information
  - connections go across scales：multi-scale fusion
  - searched by NAS
  - 是一个完整的FPN，不是encoder-decoder那种可分的形式
  - directly connect to classification and bounding box regression subnets
  - base on ResNet50
    - use bottleneck feature blocks
    - two inputs for each feature blocks
    - roughly the same computation
方法
- formulation
  - overall architecture
    - stem：scale-decreased architecture
    - scale-permuted network
    - blocks in the stem network can be candidate inputs for the following scale-permuted network
  - scale-permuted network
    - building blocks：$B_k$
    - feature level：$L_3 - L_7$
    - output features：1x1 conv，$P_3 - P_7$
- search space
  - scale-permuted network：
    - block只能从前往后connect
    - based on resNet blocks
    - channel 256 for $L_5, L_6, L_7$
  - cross-scale connections：
    - two input connections for each block
    - from lower ordering block / stem
    - resampling
      - narrow factor $\alpha$：1x1 conv
      - 上采样：interpolation
      - 下采样：3x3 s2 conv
      - element-wise add
  - block adjustment
    - intermediate blocks can adjust its scale level & type
    - level from {-1, 0, 1, 2}
    - select from bottleneck / residual block
- family of models
  - R[N] - SP[M]：N feature layers in stem & M feature layers in scale-permuted layers
  - gradually shift from stem to SP
  - with size decreasing
- spineNet family
  - basic：spineNet-49
  - spineNet-49S：channel数scaled down by 0.65
  - spineNet-96：double the number of blocks
  - spineNet-143：repeat 3 times，fusion narrow factor $\alpha=1$
  - spineNet-190：repeat 4 times，fusion narrow factor $\alpha=1$，channel数scaled up by 1.3
实验
- 在mid/heavy量级上，比resnet-family-FPN涨出两个点
- 在light量级上，比mobileNet-family-FPN涨出一个点