Let be a Random Variable taking values in a discrete set ,
following a distribution
Let be the Information Content of
Mathematical entropy is defined as:

In other words:

By convention, we take , otherwise we might write

We usually only care about .

Joint Entropy
Conditional Entropy
Fano’s Inequality
Mutual Information

Lemma

For with :

Additionally
if and only if

Proof

Take in Gibbs’ inequality.

Intuition

Entropy is a measure of ‘randomness’ or ‘uncertainty’
The entropy is roughly speaking the expected number of tosses of a fair coin needed to simulate
(its a two sided coin because we use )

Example 1

Suppose
we identify with
so

Example 2



(to get this, think of a binary tree!)

So Example 1 is more random than Example 2.