Score:1

Crypto

Why Entropy to be defined as Joint Probability Distribution Sum?

João Víctor Melo

4/30/23, 8:08 PM

From Stinson's book, during the demonstration of the following Theorem which says:

$H(X,Y) \leq H(X) + H(Y)$, with equality if and only if $X$ and $Y$ are independent random variables.

The author says to assume $X$ to take the values $x_i$, $i$ in the interval from 1 to m, and $Y$ to take the values $y_j$, $j$ in the interval from 1 to n, he denotes $p_i = \Pr[X=x_i]$, $i$ from 1 to m, and $q_j = \Pr[Y=y_j]$, $j$ from 1 to n. Then, he defines $r_{ij} = \Pr[X = x_i, Y = y_j]$, $i$ from 1 to m and $j$ from 1 to n, my question is:

why is $$p_i = \sum_{j=1}^{n} r_{ij}$$

and $$q_j = \sum_{i=1}^{m} r_{ij}$$

I would like a detailed demonstration. I would also like to comprehend better what $H(X,Y)$ means.

0 + 0

entropy

randomness

probability

João Víctor Melo

4/30/23, 8:36 PM

The author that says it is Stinson.

Score:3

Crypto

kelalaka

4/30/23, 9:40 PM

First note that the the comma in the proability is the AND operator; $$ \Pr[X = x , Y = y] = \Pr[X = x \wedge Y = y]$$ This is common notation to simplify the writing.

Now, explicitly write as

$$p_i = \sum_{j=1}^{n} r_{ij} = \Pr[X = x_i \wedge Y = y_0] + \Pr[X = x_i \wedge Y = y_1] + \cdots + \Pr[X = x_i \wedge Y = y_m]$$

Since the random variables $X$ and $Y$ are independent then this is just a partition of the event $x_i$ by the random variable $Y$.

As a solid case, consider two dices; one has the $X$ and the other is the $Y$ as their random variable representing the upper value of the dice. In total there are 36 possible equal values of the two dice roll. Fix the first one, let say $3$ then

\begin{align}\Pr(X=3) = & \Pr(X=3,Y=1)+\\ & \Pr(X=3,Y=2)+\\ & \Pr(X=3,Y=3)+\\ & \Pr(X=3,Y=4)+\\ & \Pr(X=3,Y=5)+\\ & \Pr(X=3,Y=6)\\ = &\frac{1}{36}+ \frac{1}{36}+ \frac{1}{36}+ \frac{1}{36}+ \frac{1}{36} +\frac{1}{36} = \frac{1}{6} \end{align}

$H(X,Y)$ is actually the Joint Entropy and the formula is given by (again the AND);

$$H(X,Y) = -\sum_{x\in\mathcal X} \sum_{y\in\mathcal Y} P(x,y) \log_2[P(x,y)]$$

In our context this is

$$H(X,Y) = -\sum_{x\in X} \sum_{y\in Y} P(X=x,Y=y) \log_2[P(X=x,Y=y)]$$

$H(X,Y)$ is simultaneous evaluation of $X$ and $Y$ and that is equal to first evaluating $X$ then given value of $X$ evaluate the $Y$

$$H(X,Y)= H(X|Y)+H(Y)=H(Y|X)+H(X) $$

Proving this bit long;

\begin{align} H(X,Y) & = − \sum_{i=1}^n \sum_{j=1}^m \Pr(X=x_i,Y =y_j) \log \big( \Pr(X=x_i,Y =y_j) \big)\\ & = − \sum_{i=1}^n \sum_{j=1}^m \Pr(X=x_i,Y =y_j) \log \big( \Pr(X=x_i) \Pr(Y|X = y_j|x_i) \big)\\ & = − \sum_{i=1}^n \sum_{j=1}^m \Pr(X=x_i,Y =y_j) \big[ \log \big( \Pr(X=x_i) \big) + \log \big( \Pr(Y|X = y_j|x_i) \big) \big] \\ & = − \sum_{i=1}^n \left( \sum_{j=1}^m \Pr(X=x_i,Y =y_j) \right) \log \big( \Pr(X=x_i) \big) \\ & - \sum_{i=1}^n \sum_{j=1}^m \Pr(X=x_i,Y =y_j) \log \left( \Pr(Y|X = y_j|x_i) \right)\\ & = H(X) + H(Y|X) \end{align}

0 + 0

João Víctor Melo

5/2/23, 8:30 PM

This shouldn't be $\Pr(X=x_i) \Pr(X|Y = x_i|y_j)$ ?

kelalaka

5/2/23, 8:42 PM

Which line are we talking?

João Víctor Melo

5/2/23, 8:43 PM

Line two after "Proving this bit long;".

kelalaka

5/2/23, 8:49 PM

$\Pr(X \wedge Y) = \Pr(Y | X) \Pr(X) = \Pr(X | Y) \Pr(Y)$

João Víctor Melo

5/2/23, 8:50 PM

But how do you know they are equal?

kelalaka

5/2/23, 8:51 PM

[Conditional probability as an axiom?](https://en.wikipedia.org/wiki/Conditional_probability#As_an_axiom_of_probability)?

João Víctor Melo

5/2/23, 8:55 PM

Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/132829/discussion-between-joao-victor-melo-and-kelalaka).

Score:1

Crypto

kodlu

5/1/23, 3:16 PM

Entropy does not depend on what the "labels" or values of the random variable is, it is a property ONLY of the distribution. After all you just use $P(x), P(y), P(x,y)$ etc in the formula not $x,y$.

Once you realize this, the set of probabilities $P(x,y)$ is all you need to use and apply the original definition of entropy for a single random variable. If you like, define a vector random variable $z=(x,y)$ and compute its entropy as $$ -\sum_{z} P(z) \log P(z) $$ which is the same as computing $$ -\sum_{x,y} P(x,y) \log P(x,y) $$ This also means that the joint entropy of a number of random variables $H(x_1,\ldots,x_n)=H(p_1,\ldots,p_n):=H_0$ with $P(x_i)=p_i,$ is the same as the entropy of any reordering (permutation) of the joint distribution so this means

$$ H(p_{\sigma(1))},p_{\sigma(2)},\ldots,p_{\sigma(n)})=H_0 $$ for all permutations $\sigma:\{1,\ldots,n\}\rightarrow \{1,\ldots,n\}.$

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Why Entropy to be defined as Joint Probability Distribution Sum?

TH: เหตุใดเอนโทรปีจึงถูกกำหนดเป็นผลรวมการแจกแจงความน่าจะเป็นร่วม

RO: De ce Entropia trebuie definită ca Sumă de distribuție a probabilității comune?

RU: Почему энтропию следует определять как общую сумму распределения вероятностей?

VI: Tại sao Entropy được định nghĩa là Tổng phân phối xác suất chung?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.