The entropy of a variable is a sort of average surprise.

This is the discrete case, a parallel exists for continuous events in the form of differential entropy.

\begin{equation} H(X) = -\sum_{i = 1}^{n} p(x_i) \log{p(x_i)} \end{equation}

When the log base is 2 (as it should always be), entropy is simply given in bits, as the number of bits needed to fully encode the event space. In general, this ideal bit sequence is given by huffman coding.

Note some properties of the idea of surprise, \(-\log{p(x_1)}\):

Entropy is thus the expectation of surprise.

However, rarer events also happen less frequently, so they are "weighted" by their probability.

Some extra properties:

There exist other forms of multivariate entropy:

These forms are tighly interconnected, as shown in this diagram: info_diagram.png

Back to index