complement cross entropy

  1. summary
    • 使用complement loss的主要动机是one-hot的label下,ce只关注拉高正样本概率,丧失掉了其他incorrect类别的信息
    • 事实上对于incorrect类别,可以让其输出概率值分布的熵尽可能的大——也就是将这个分布尽可能推向均匀分布,让它们之间互相遏制从而凸显出ground truth的概率
    • 但这是建立在“各个标签之间相互独立”这个假设上,如果类别间有hierarchical的关系/multi-label,就不行了。
    • 在数学表达上,
      • 首先仍然是用ce作用于correct label,希望正样本概率gt_pred尽可能提高,接近真实值
      • 然后是作用于incorrect label的cce,在除了正例pred possibility以外的几个概率上,计算交叉熵,希望这几个概率尽可能服从均匀分布,概率接近$\frac{1-gt_pred}{K-1}$
      • 我感觉这就是label smoothing,主要区别就是cce上有个norm项,label smoothin在计算ce的时候,vector中每一个incorrect label的熵都与correct label等权重,cce对整个incorrect vector的权重与correct label等同,且可以调整。

Imbalanced Image Classification with Complement Cross Entropy

  1. 动机

    • class-balanced datasets
    • motivated by COT(complement objective training)
      • suppressing softmax probabilities on incorrect classes during training
    • propose cce
      • keep ground truth probability overwhelm the other classes
      • neutralizing predicted probabilities on incorrect classes
  2. 论点

    • class imbalace
      • limits generalization
      • resample
        • oversampling on minority classes
        • undersampling on majority classes
      • reweight
        • neglect the fact that samples on minority classes may have noise or false annotations
        • might cause poor generalization
    • observed degradation in imbalanced datasets using CE
      • cross entropy mostly ignores output scores on wrong classes
      • neutralizing predicted probabilities on incorrect classes helps improve accuracy of prediction for imbalanced image classification
  3. 方法

    • complement entropy

      • calculated on incorrect classes
      • N samples,K-dims class vector
      • $C(y,\hat y)=-\frac{1}{N}\sum_{i=1}^N\sum_{j=1,j \neq g}^K \frac{\hat y^j}{1-\hat y^g}log\frac{\hat y^j}{1-\hat y^g} $
      • the purpose is to encourage larger gap between ground truth and other classes —— when the incorrect classes obey normal distribution it reaches optimal
    • balanced complement entropy

      • add balancing factor
      • $C^{‘}(y,\hat y) = \frac{1}{K-1}C(y,\hat y)$
    • forming COT:

      • twice back-propagation per each iteration
        • first cross entropy
        • second complement entropy
    • CCE (Complement Cross Entropy)

      • add modulating factor:$\tilde C(y, \hat y) = \frac{\gamma}{K-1}C(y, \hat y)$,$\gamma=-1$
      • combination:CE+CCE
  1. 实验

    • dataset:

      • cifar
      • class-balanced originally
      • construct imbalanced variants with imbalance ratio $\frac{N_{min}}{N_{max}}$
    • test acc

      • 论文的实验结果都是在cifar上cce好于cot好于focal loss,在road上cce好于cot,没放fl
      • 咱也不知道。。。