PANet: Path Aggregation Network for Instance Segmentation
动机
- boost the information flow
- bottom-up path
- shorten information path
- enhance accurate localization
- adaptive feature pooling
- aggregate all levels
- avoiding arbitrarily assigned results
- mask prediction head
- fcn + fc
- captures different views, possess complementary properties
- subtle extra computational
论点
- previous skills: fcn, fpn, residual, dense
- findings
- 高层特征类别准,底层特征定位准,但是高层和底层特征之间的path太长了,不利于双高
- past proposals make predictions based on one level
- PANet
- bottom-up path
- shorten information path
- enhance accurate localization
- adaptive feature pooling
- aggregate all levels
- avoiding arbitrarily assigned results
- mask prediction head
- fcn + fc
- captures different views, possess complementary properties
- bottom-up path
方法
framework
- b: bottom-up path
- c: adaptive feature pooling
- e: fusion mask branch
bottom-up path
- fpn’s top-down path:
- to propagate strong semantical information
- to ensure reasonable classification capability
- long path: red line, 100+ layers
- bottom-up path:
- enhances the localization capability
- short path: green line, less than 10 layers
for each level $N_l$
- input: $N_{l+1}$ & $P_l$
- $N_{l+1}$ 3x3 conv & $P_l$ id path - add - 3x3 conv
- channel 256
- ReLU after conv
- fpn’s top-down path:
adaptive feature pooling
pool features from all levels, then fuse, then predict
steps
- map each proposal to all feature levels
- roi align
- go through one layer of the following sub-networks independently
- fusion operation (element-wise max or sum)
例如,box branch是两个fc层,来自各个level的roi align之后的proposal features,先各自经过一个fc层,再share the following till the head,mask branch是4个conv层,来自各个level的roi align之后的proposal features,先各自经过一个conv层,再share the following till the head
fusion mask branch
- fc layers are location sensitive
- helpful to differentiate instances and recognize separate parts belonging to the same object
- conv分支
- 4个连续conv+1个deconv:3x3 conv,channel256,deconv factor=2
- predict mask of each class:output channel n_classes
- fc分支
- from conv分支的conv3输出
- 2个连续conv,channel256,channel128
- fc,dim=28x28,特征图尺寸,用于前背景分类
final mask:add
实验
- heavier head
- 4 consecutive 3x3 convs
- shared among reg & cls
- 在multi-task的情况下,对box的预测有效
- heavier head