# conditional entropy

Conditional entropy tells us entropy in \(Y\) given \(X\) has been seen, averaged over all values of \(X\).

\begin{equation}
H(Y|X) = H(X, Y) - H(X)
\end{equation}

If \(Y\) is totally determined by \(X\), then \(H(X, X) - H(X) = H(X) - H(X) = 0\) and so the conditional entropy is 0.

If the events are independent, then from the result in joint entropy, in which \(H(X, Y) \leq H(Y) + H(X)\), an upper bound on the conditional entropy is

\begin{equation}
H(Y|X) \leq H(Y)
\end{equation}

Conditioning on an extra variable never decreases entropy (never decreases information) on average, although for any particular observation the conditional entropy might be higher.

The close coupling with joint entropy results in

\begin{equation}
H(X_1 .. X_n) = \sum_{i = 1}^{n} H(X_i | X_1 .. X_n)
\end{equation}