Score:1

Why Entropy to be defined as Joint Probability Distribution Sum?

au flag

From Stinson's book, during the demonstration of the following Theorem which says:

$H(X,Y) \leq H(X) + H(Y)$, with equality if and only if $X$ and $Y$ are independent random variables.

The author says to assume $X$ to take the values $x_i$, $i$ in the interval from 1 to m, and $Y$ to take the values $y_j$, $j$ in the interval from 1 to n, he denotes $p_i = \Pr[X=x_i]$, $i$ from 1 to m, and $q_j = \Pr[Y=y_j]$, $j$ from 1 to n. Then, he defines $r_{ij} = \Pr[X = x_i, Y = y_j]$, $i$ from 1 to m and $j$ from 1 to n, my question is:

why is $$p_i = \sum_{j=1}^{n} r_{ij}$$

and $$q_j = \sum_{i=1}^{m} r_{ij}$$

I would like a detailed demonstration. I would also like to comprehend better what $H(X,Y)$ means.

João Víctor Melo avatar
au flag
The author that says it is Stinson.
Score:3
in flag

First note that the the comma in the proability is the AND operator; $$ \Pr[X = x , Y = y] = \Pr[X = x \wedge Y = y]$$ This is common notation to simplify the writing.

Now, explicitly write as

$$p_i = \sum_{j=1}^{n} r_{ij} = \Pr[X = x_i \wedge Y = y_0] + \Pr[X = x_i \wedge Y = y_1] + \cdots + \Pr[X = x_i \wedge Y = y_m]$$

Since the random variables $X$ and $Y$ are independent then this is just a partition of the event $x_i$ by the random variable $Y$.

As a solid case, consider two dices; one has the $X$ and the other is the $Y$ as their random variable representing the upper value of the dice. In total there are 36 possible equal values of the two dice roll. Fix the first one, let say $3$ then

\begin{align}\Pr(X=3) = & \Pr(X=3,Y=1)+\\ & \Pr(X=3,Y=2)+\\ & \Pr(X=3,Y=3)+\\ & \Pr(X=3,Y=4)+\\ & \Pr(X=3,Y=5)+\\ & \Pr(X=3,Y=6)\\ = &\frac{1}{36}+ \frac{1}{36}+ \frac{1}{36}+ \frac{1}{36}+ \frac{1}{36} +\frac{1}{36} = \frac{1}{6} \end{align}


$H(X,Y)$ is actually the Joint Entropy and the formula is given by (again the AND);

$$H(X,Y) = -\sum_{x\in\mathcal X} \sum_{y\in\mathcal Y} P(x,y) \log_2[P(x,y)]$$

In our context this is

$$H(X,Y) = -\sum_{x\in X} \sum_{y\in Y} P(X=x,Y=y) \log_2[P(X=x,Y=y)]$$

$H(X,Y)$ is simultaneous evaluation of $X$ and $Y$ and that is equal to first evaluating $X$ then given value of $X$ evaluate the $Y$

$$H(X,Y)= H(X|Y)+H(Y)=H(Y|X)+H(X) $$

Proving this bit long;

\begin{align} H(X,Y) & = − \sum_{i=1}^n \sum_{j=1}^m \Pr(X=x_i,Y =y_j) \log \big( \Pr(X=x_i,Y =y_j) \big)\\ & = − \sum_{i=1}^n \sum_{j=1}^m \Pr(X=x_i,Y =y_j) \log \big( \Pr(X=x_i) \Pr(Y|X = y_j|x_i) \big)\\ & = − \sum_{i=1}^n \sum_{j=1}^m \Pr(X=x_i,Y =y_j) \big[ \log \big( \Pr(X=x_i) \big) + \log \big( \Pr(Y|X = y_j|x_i) \big) \big] \\ & = − \sum_{i=1}^n \left( \sum_{j=1}^m \Pr(X=x_i,Y =y_j) \right) \log \big( \Pr(X=x_i) \big) \\ & - \sum_{i=1}^n \sum_{j=1}^m \Pr(X=x_i,Y =y_j) \log \left( \Pr(Y|X = y_j|x_i) \right)\\ & = H(X) + H(Y|X) \end{align}

João Víctor Melo avatar
au flag
This shouldn't be $\Pr(X=x_i) \Pr(X|Y = x_i|y_j)$ ?
kelalaka avatar
in flag
Which line are we talking?
João Víctor Melo avatar
au flag
Line two after "Proving this bit long;".
kelalaka avatar
in flag
$\Pr(X \wedge Y) = \Pr(Y | X) \Pr(X) = \Pr(X | Y) \Pr(Y)$
João Víctor Melo avatar
au flag
But how do you know they are equal?
kelalaka avatar
in flag
[Conditional probability as an axiom?](https://en.wikipedia.org/wiki/Conditional_probability#As_an_axiom_of_probability)?
João Víctor Melo avatar
au flag
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/132829/discussion-between-joao-victor-melo-and-kelalaka).
Score:1
sa flag

Entropy does not depend on what the "labels" or values of the random variable is, it is a property ONLY of the distribution. After all you just use $P(x), P(y), P(x,y)$ etc in the formula not $x,y$.

Once you realize this, the set of probabilities $P(x,y)$ is all you need to use and apply the original definition of entropy for a single random variable. If you like, define a vector random variable $z=(x,y)$ and compute its entropy as $$ -\sum_{z} P(z) \log P(z) $$ which is the same as computing $$ -\sum_{x,y} P(x,y) \log P(x,y) $$ This also means that the joint entropy of a number of random variables $H(x_1,\ldots,x_n)=H(p_1,\ldots,p_n):=H_0$ with $P(x_i)=p_i,$ is the same as the entropy of any reordering (permutation) of the joint distribution so this means

$$ H(p_{\sigma(1))},p_{\sigma(2)},\ldots,p_{\sigma(n)})=H_0 $$ for all permutations $\sigma:\{1,\ldots,n\}\rightarrow \{1,\ldots,n\}.$

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.