MegDet: A Large Mini-Batch Object Detector
动机
- past methods mainly come from novel framework or loss design
this paper studies the mini-batch size
- enable training with a large mini-batch size
- warmup learning rate policy
- cross-gpu batch normalization
faster & better acc
论点
potential drawbacks with small mini-batch sizes
learning rate dilemma
- large min-batch size usually requires large learning rate
- large learning rate is likely leading to convergence failure
- a smaller learning rate often obtains inferior results
solution of the paper
- linear scaling rule
- warmup
- Cross-GPU Batch Normalization (CGBN)
方法
一次同步
* 计算每张卡的$\sum x_i$和$\sum x_i^2$,就可以一次性算出总均值和总方差
v1.5.2