Consider a Discrete Memoryless Channel (DMC)
Let take values in where with probabilities
Let be a random variable output when channel is given input
The information channel capacity is

(where is Mutual Information)
(max is obtained since it is a continuous function on a compact set)