KL Divergence

  1. KL divergence用于度量两个分布P和Q的差异,这种度量【不具有】对称性
    • P是实际分布(pred probs)
    • Q是建模分布(gt)
    • $D_{KL}(P||Q)=\sum_i P(i)ln\frac{P(i)}{Q(i)}$
    • 散度定义为分布P和分布Q之间的对数差异的加权和,用P的概率去加权
    • 当Q是one-hot label的时候,要先clip再log
  2. 方法

    • torch.nn.functional.kl_div(input, target, size_average=None, reduce=None, reduction=’mean’)
      • input:对数概率
      • target:概率
    • tf.distributions.kl_divergence(distribution_a, distribution_b, allow_nan_stats=True, name=None)
      • distribution_a&b 来自tf.distributions.Categorical(logits=None, prob=None, …)
      • 传入logits/probs,先转换成distribution,再计算kl divergence
    • torch.nn.KLDivLoss
    • tf.keras.losses.KLDivergence
    • tf.keras.losses.kullback_leibler_divergence
  3. code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# torch version
import torch.nn as nn
import torch.nn.functional as F

class KL(nn.Module):
def __init__(self, args):
super(KL, self).__init__()
self.T = args.temperature

def forward(self, logits_p, logits_q):
log_p = F.log_softmax(logits_p/self.T, dim=1)
q = F.softmax(logits_q/self.T, dim=1)
loss = F.kl_div(log_p, p_t)


# keras version
import tensorflow as tf
import keras.backend as K

def kl_div(logits_p, logits_q):
T = 4.
log_p = tf.nn.log_softmax(logits_p/T) # (b,cls)
log_q = tf.nn.log_softmax(logits_q/T)
p = K.exp(log_p)
return K.sum(p*(log_p-log_q), axis=-1) # (b,)