efficient周边

因为不是googlenet家族官方出品,所以放在外面

[EfficientFCN] EfficientFCN: Holistically-guided Decoding for Semantic Segmentation:商汤,主要针对upsampling是局部感受野,重建失真多,分割精度差的问题,提出了Holistically-guided Decoder (HGD) ,用来recover the high-resolution (OS=8) feature maps,想法上接近SCSE-block,数学表达上接近bilinear-CNN,性能提升主要归因于eff back吧。

EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

  1. 动机

    • Semantic Segmentation
      • dilatedFCN:computational complexity
      • encoder-decoder:performance
    • proposed EfficientFCN
      • common back without dilated convolution
      • holistically-guided decoder
    • balance performance and efficiency
  2. 论点

    • key elements for semantic segmentation
      • high-resolution feature maps
      • pre-trained weights
    • OS32 feature map:the fine-grained structural information is discarded
    • dilated convolution:no extra parameters introduced but equire high computational complexity and memory consumption
    • encoder-decoder based methods
      • repeated upsampling + skip connection procedure
        • upsampling
        • concat/add
        • successive convs
      • Even with the skip connections, lower-level high-resolution feature maps cannot provide abstractive enough features for achieving high- performance segmentation
      • The bilinear upsampling or deconvolution operations are conducted in a local manner(from a limited receptive filed)
      • improvements
        • reweight:SE-block
        • scales each feature channel but maintains the original spatial size and structures:【scse block对spacial有加权啊】
    • propose EfficientFCN

      • widely used classification model
      • Holistically-guided Decoder (HGD)
        • take OS8, OS16, OS32 feature maps from backbone
        • OS8和OS16用来spatially guiding the feature upsampling process
        • OS32用来encode the global context然后基于guidance进行上采样
        • linear assembly at each high-resolution spatial location:感觉就是对上采样特征图做了加权

  3. 方法

    • Holistically-guided Decoder

      • multi-scale feature fusion
      • holistic codebook generation
        • from high-level feature maps
        • holistic codewords:without any spatial order
      • codeword assembly

    • multi-scale feature fusion

      • we observe the fusion of multi-scale feature maps generally result in better performance
      • compress:separate 1x1 convs
      • bilinear downsamp/upsamp
      • concatenate
      • fused OS32 $m_{32}$ & fused OS8 $m_8$
    • holistic codebook generation

      • from $m_{32}$
      • two separate 1x1 conv
        • a codeword based map $B \in R^{1024(H/32)(W/32)}$:每个位置用一个1024-dim的vector来描述
        • n spatial weighting map $A\in R^{n(H/32)(W/32)}$:highlight 特征图上不同区域
          • softmax norm in spatial-dim
          • $\widetilde A_i(x,y)=\frac{exp(A_i(x,y))}{\sum_{p,q} exp(A_i(p,q))}, i\in [0,n)$
      • codeword $c_i \in R^{1024}$
        • global description for each weighting map
        • weighted average of B on all locations
        • $c_i = \sum_{p,q} \widetilde A_i(p,q) B(p,q)$
        • each codeword captures certain aspect of the global context
      • orderless high-level global features $C \in R^{1024*n}$
        • $C = [c_1, …, c_n]$
    • codeword assembly

      • raw guidance map $G \in R^{1024(H/8)(W/8)}$:1x1 conv on $m_8$
      • fuse semantic-rich feature map $\overline B \in R^{1024}$:global average vector
      • novel guidance feature map $\overline G = G \oplus \overline B $:location-wise addition【????】
      • linear assembly weights of the n codewords $W \in R^{n(H/8)(W/8)}$:1x1 conv on $\overline G$
      • holistically-guided upsampled feature $\tilde f_8 = W^T C$:reshape & dot
      • final feature map $f_8$:concat $\tilde f_8$ and $G$
    • final segmentation

      • 1x1 conv
      • further upsampling
  4. 实验

    • numer of holistic codewords

      • 32-512:increase
      • 512-1024:slight drop
      • we observe the number of codewords needed is approximately 4 times than the number of classes