# kl divergence

Kullback-Leibler divergence is a distance metric to measure how close tow distributions are in terms of information.

\begin{equation}
D_{KL} (p || q) = \sum_{k = 1}^{K} p_k \log \frac{p_k}{q_k}
\end{equation}

Logarithm of division is subtraction of logs, so

\begin{equation}
D_{KL} (p || q) = \sum_{k = 1}^{K} p_k \log p_k - \sum_{k = 1}^{K} p_k \log q_k = -H(p) + H(p;q)
\end{equation}

Where the second term is the cross entropy of \(p\) and \(q\).

So, **KL divergence is nothing but the extra number of bits needed to
encode \(p\) by using the distribution \(q\) instead of \(p\)**!

KL divergence can be shown to be never negative.