ArcFace

一些metric loss特点的总结:

* margin-loss:样本与自身类簇心的距离要小于样本与其他类簇心的距离——标准center loss
* intra-loss:对样本和对应类簇心的距离做约束——小于一定距离
* inter-loss:对样本和其他类簇心的距离做约束——大于一定距离
* triplet-loss:样本与同类样本的距离要小于样本与其他类样本的距离

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

  1. 动机

    • 场景:人脸,
    • 常规要素:
      • hypersphere:投影空间
      • metric learning:距离(Angles/Euclidean) & class centres
    • we propose
      • an additive angular margin loss:ArcFace
      • has a clear geometric interpretation
      • SOTA on face & video datasets
  2. 论点

    • face recognition
      • given face image
      • pose normalisation
      • Deep Convolutional Neural Network (DCNN)
      • into feature that has small intra-class and large inter-class distance
    • two main lines
      • train a classifier:softmax
        • 最后的分类层参数量与类别数成正比
        • not discriminative enough for the open-set
      • train embedding:triplet loss
        • triplet-pair的数量激增,大数据集的iterations特别多
        • sampling mining很重要,对精度&收敛速度
    • to enhance softmax loss
      • center loss:在分类的基础上,压缩feature vecs的类内距离
      • multiplicative angular margin penalty:类特别多的时候,center就不好更新了,用last fc weights能够替代center,但是会不稳定
      • CosFace:直接计算logit的cosine margin penalty,better & easier
    • ArcFace
      • improve the discriminative power
      • stabilise the training meanwhile
      • margin-loss:Distance(类内)+m < Distance(类间)
      • 核心idea:normed feature和normed weights的dot product等价于在求他俩的 cosine distance,我们用arccos就能得到feature vec和target weight的夹角,给这个夹角加上一个margin,然后求回cos,作为pred logit,最后softmax
  3. 方法

    • ArcFace

      • transitional softmax

        • not explicitly enforce intra-class similarity & inter-class diversity
        • 对于类内variations大/large-scale测试集的场景往往有performance gap
      • our modification

        • fix the bias $b_j=0$ for simplicity

        • transform the logit $W_j^T x=||W_j||\ ||x||cos\theta_j$,$\theta_j$是weight $W_j \in R^d$和样本feature $x \in R^d$的夹角

        • fix the $||W_j||$ by l2 norm:$||W_j||=1$

        • fix the embedding $||x||$ by l2 norm and rescale: $||x||=s$

        • thus only depend on angle:这使得feature embedding分布在一个高维球面上,最小化与gt class的轴(对应channel的weight vec,也可以看作class center)夹角

        • add an additive angular margin penalty:simultaneously enhance the intra-class compactness and inter-class discrepancy

        • 作用

          • softmax produce noticeable ambiguity in decision boundaries
          • ArcFace loss can enforce a more evident gap

      • pipeline

      • 实现