kl divergence

Kullback-Leibler divergence is a distance metric to measure how close tow distributions are in terms of information.

DKL(p||q)=Kk=1pklogpkqk

Logarithm of division is subtraction of logs, so

DKL(p||q)=Kk=1pklogpkKk=1pklogqk=H(p)+H(p;q)

Where the second term is the cross entropy of p and q.

So, KL divergence is nothing but the extra number of bits needed to encode p by using the distribution q instead of p!

KL divergence can be shown to be never negative.

Back to index