differential entropy of a gaussian

An important result is the differential entropy of a gaussian variable.

The pdf of a multivariate gaussian is

\begin{equation} f(x) = \frac{1}{\sqrt{(2\pi)^n \det (\Sigma)}} \exp (-(x-\mu)^T \Sigma^{-1} (x-\mu)) \end{equation}

Now, we know that entropy is expectation of "surprise," or \(-\log{p(x)}\). "LOE" is "linearity of expectation."

\begin{align*} H[X] &= -E[\log(p(X))] && \text{Moved negative sign out, LOE} \\ &= -E[\log \left( \frac{1}{(2\pi)^{\frac{n}{2}} \det (\Sigma)^\frac{1}{2}} \right) -\frac{1}{2}(x-\mu)^T \Sigma^{-1} (x-\mu)] && \text{Substitution, square root to power} \\ &= -E[\frac{n}{2} \log(2\pi) + \frac{1}{2} \log(\det(\Sigma)) -\frac{1}{2}(x-\mu)^T \Sigma^{-1} (x-\mu)] && \text{Log rule of multiplication} \\ &= \frac{n}{2} \log(2\pi) + \frac{1}{2} \log(\det(\Sigma)) +\frac{1}{2} E[(x-\mu)^T \Sigma^{-1} (x-\mu)] && \text{LOE} \\ &= \frac{n}{2} \log(2\pi) + \frac{1}{2} \log(\det(\Sigma)) +\frac{1}{2} E[tr((x-\mu)^T \Sigma^{-1} (x-\mu))] && \text{Term is a quadratic form resulting in a scalar, trace of scalar is itself} \\ &= \frac{n}{2} \log(2\pi) + \frac{1}{2} \log(\det(\Sigma)) +\frac{1}{2} E[tr(\Sigma^{-1} (x-\mu) (x-\mu)^T)] && \text{commutation of trace} \\ &= \frac{n}{2} \log(2\pi) + \frac{1}{2} \log(\det(\Sigma)) +\frac{1}{2} tr(E[\Sigma^{-1} (x-\mu) (x-\mu)^T]) && \text{LOE} \\ &= \frac{n}{2} \log(2\pi) + \frac{1}{2} \log(\det(\Sigma)) +\frac{1}{2} tr(\Sigma^{-1} E[(x-\mu) (x-\mu)^T]) && \text{Covariance matrix is constant} \\ &= \frac{n}{2} \log(2\pi) + \frac{1}{2} \log(\det(\Sigma)) +\frac{1}{2} tr(\Sigma^{-1} \Sigma) && \text{Definition of covariance matrix} \\ &= \frac{n}{2} \log(2\pi) + \frac{1}{2} \log(\det(\Sigma)) +\frac{1}{2} tr(I_n) && \text{Matrix times its inverse is identity} \\ &= \frac{n}{2} \log(2\pi) + \frac{1}{2} \log(\det(\Sigma)) +\frac{1}{2} n && \text{Trace of identity is the length of diagonal} \\ &= \frac{n}{2} \log (2\pi e \det(\Sigma)^{\frac{1}{n}}) \end{align*}

Important result is that the differential entropy only depends on the variance of the gaussian, and not its mean.

Back to index