kl divergence
Kullback-Leibler divergence is a distance metric to measure how close tow distributions are in terms of information.
\begin{equation}
D_{KL} (p || q) = \sum_{k = 1}^{K} p_k \log \frac{p_k}{q_k}
\end{equation}
Logarithm of division is subtraction of logs, so
\begin{equation}
D_{KL} (p || q) = \sum_{k = 1}^{K} p_k \log p_k - \sum_{k = 1}^{K} p_k \log q_k = -H(p) + H(p;q)
\end{equation}
Where the second term is the cross entropy of \(p\) and \(q\).
So, KL divergence is nothing but the extra number of bits needed to encode \(p\) by using the distribution \(q\) instead of \(p\)!
KL divergence can be shown to be never negative.