By Jordi de la Torre on February 22, 2017
Mutual information between two random variables and can be expressed mathematically (by definition) as the Kullback-Leibler divergence between the joint distribution of both variables and the distribution .
Mutual information definition is written normally as a function of the entropy but I find more intuitive the first formulation. One can be derived from the other just by mathematical manipulation.
In case where both variables are independent, is equal to and the KL divergence between both distributions is 0.
Intuitively, we can affirm that knowing about one variable gives no information about the other.
In the case where both variables are not independent, knowing about one variable gives information about the other.
In this case and will differ and the mutual information would be different from zero.