Signal as added to a gaussian variable, with the property that the noise is "white," or having constant power across all frequency. i.e the power spectral density is flat. This also implies that the autocorrelation function has a single impulse exactly at $$\tau = 0$$, so knowing a data point gives you no information about any other data points. This can also be interpreted to mean that the gaussian noise is simply independent.

## 1. Capacity of an AWGN channel

We would like to determine the channel capacity of the awgn channel.

We model the system as

\begin{equation} Y_i = X_i + Z_i \end{equation}

Where $$Z$$ is a gaussian, so $$\mathcal{0,N}$$, where $$N$$ is the variance of the noise and therefore power.

We would also like to bound the variance (and so power) of the input signal, so that for a symbol $$(x_1 ... x_n)$$,

\begin{equation} \frac{1}{k} \sum_{i = 1}^{k} x_i^2 \leq P \end{equation}

Where $$P$$ is then the max power of the input.

Recall that the definition of the capacity of a channel is

\begin{equation} C = \sup_{f(x) E(X^2) \leq P} I(X;Y) \end{equation}

So we are trying te find the maximum value of the mutual information between $$X$$ and $$Y$$ over all distributions of $$X$$ valid under the power bound above. In other words, the maximum "similarity" in terms of information between the channel-modified output and the input.

We know that

\begin{align*} I(X;Y) &= h(Y) - h(Y|X) && \text{MI is reduction in surprise in Y} \\ &= h(Y) - h(X+Z|X) && \text{Substituting definition of Y} \\ &= h(Y) - h(Z|X) && \text{X adds no entropy since it is given in cond.} \\ &= h(Y) - h(Z) && \text{X and Z are independent} \\ \end{align*}

Equivalence 3 comes from the fact that $$X$$ has already been "seen" as the prior in the conditional entropy term.

We'll proceed by trying to find the upper bound on $$h(Y)$$ and lower bound on $$h(Z)$$. This makes sense, because the maximum mutual information between input and output should be when we maximize the information of the output (which is largely determined by the input), and minimize the information in the noise.

1. Lower bound on $$h(Z)$$

The simpler one to do first is noise. From the differential entropy of a gaussian,

\begin{equation} h(Z) = \frac{1}{2} \log(2\pi e N) \end{equation}

Remember that the differential entropy of the gaussian is determined fully by its variance, which is here the power of the noise.

We can't really do any lower here.

2. Upper bound on $$h(Y)$$

We start by imposing another bound with

\begin{align*} P_r &= E(Y^2) \\ &= E((X+Z)^2) && \text{Substituting definition of Y} \\ &= E(X^2) + E(2XZ) + E(Z^2) && \text{Expansion} \\ &= E(X^2) + 2(E(X)E(Z)) + E(Z^2) \\ &\leq P + N \end{align*}

Note that the third equality comes from jensen inequality as well as the linearity of expectation.

The last inequality comes from the fact that the recieved power, $$P_r$$ cannot be greater than the sum of the power from the input itself and the noisy channel.

We will now try to achieve the equality on the above upper bound.

We must note that the maximum differential entropy is achieved by a gaussian input $$X$$ (STUB). If the input $$X$$ and noise $$Z$$ are both gaussian, then clearly $$Y = X + Z$$ is also gaussian. To maximize power, we choose the largest possible lossless output power (variance) of $$P+N$$. Therefore, our output entropy is

\begin{equation} h(Y) \leq \frac{1}{2} \log(2\pi e(P+N)) \end{equation}

We must choose the largest possible variance and the gaussian input solely to validate the upper bound on the differential entropy of $$Y$$.

3. Conclusion

The reason we want the highest entropy here is that from the definition of capacity, we need the highest mutual information over any $$X$$, which is now, substituting back into the equation for mutual information,

\begin{align*} I(X;Y) &\leq h(Y) - h(Z) \\ &= \frac{1}{2} \log(2\pi e (P+N)) - \frac{1}{2} \log (2\pi e N) \\ &= \frac{1}{2} \left( \log(2 \pi e (P + N) - \log(2 \pi e N)) \right) && \text{Factor out half} \\ &= \frac{1}{2} \left( \log( \frac{ 2 \pi e (P + N)}{2 \pi e N} \right) && \text{Logarithm rule} \\ &= \frac{1}{2} \left( \log( \frac{(P + N)}{N} \right) && \text{Cancelling out} \\ &= \frac{1}{2} \left( \log( \frac{N}{N} + \frac{P}{N} \right) && \text{Split fraction} \\ &= \frac{1}{2} \log( 1 + \frac{P}{N}) \\ \end{align*}

In the last equivalence, we arrive at Shannon's famous result for the capacity of a noisy channel, which helped spark the field of information theory.

The ratio $$\frac{P}{N}$$ is known as the signal to noise ratio. It implies that if the noise is zero, then the capacity of the channel is infinite. Similarly, when the signal power is 0, the capacity of the channel is also 0.