# mutual information

Measure of the shared amount of information (entropy) between two random variables.

\begin{equation}
I(X;Y) = D_{KL} (p(x, y) || p(x)p(y))
\end{equation}

Where \(D_{KL}\) is kl divergence.

We are basically asking for the "distance" between the joint distribution of two variables and the product of the marginalizations.

So, \(I\) is really the information gained from using the real joint distribution with all the extra information about dependence, vs using just each variable as if they were independent.