kl divergence
Kullback-Leibler divergence is a distance metric to measure how close tow distributions are in terms of information.
DKL(p||q)=K∑k=1pklogpkqk
Logarithm of division is subtraction of logs, so
DKL(p||q)=K∑k=1pklogpk−K∑k=1pklogqk=−H(p)+H(p;q)
Where the second term is the cross entropy of p and q.
So, KL divergence is nothing but the extra number of bits needed to encode p by using the distribution q instead of p!
KL divergence can be shown to be never negative.