Let .
Then

Proof

Define functions and by
Also define by

We now just need to show that is injective.
Suppose .
Then

so we conclude

But by definition and , so .
It trivially follows that , so is injective.

Theorem

Let , , be Random Variables.
Then Mathematical Entropy satisfies:

Proof

By Data Processing Property of the Mutual Information:

and the inequality will follow.

Corollary

Let and be iid.
Then

Proof

Firstly, for any :

so taking with iid with and we find

This gives the bound.