Score:3

Questions on difference distribution in Kyber

mq flag

I have two elementary questions related to the special distribution $|x'-x \text{ mod}^{\pm} \, q| \leq B_q := \left\lceil \frac{q}{2^{d+1}} \right\rfloor$ in Kyber. The first question is about the paper and the second question is about a section in the specification. I am asking this as one question because I think it fits well thematically.


  1. In the paper on page 4 it says:

  • Unfortunately, there is no explanation why this is the case. I have had the distribution plotted once for selected parameters and would agree with this, but that's more of an argument based on experience. But can this also be formally justified e.g. by a calculation?

  1. In the specification on page 21 it says:

  • I am missing here, unfortunately, how one comes to the conclusion that the difference is 3 or 4. I could only explain this to myself by a script that I wrote. I am interested if this could be shown more formally by a calculaton.

  • At this point I am also interested in how one comes to the conclusion that each coefficient is then equally distributed over $\left\{-1,0,1\right\}$, $\left\{-1,0,1,2\right\}$ or $\left\{-2,-1,0,1\right\}$.

  • My guess: I think this is related to the $B_q$ that in this case according to the parameters $d = 10$ and $q = 3326$ and therefore $B_q = 2$. Because $|x'-x \text{ mod}^{\pm} \, q| \leq B_q := 2$ and $x'$ is 3 or 4, we can calculate the differences between a compession input and a decompression output so we obtain $\left\{-1,0,1,2\right\}$, depending on how one defines "difference" we could also obtain $\left\{-2,-1,0,1\right\}$. The other set $\left\{-1,0,1\right\}$ occurs in the case when $B_q = 1$, which is included in the case $|x'-x \text{ mod}^{\pm} \, q| \leq B_q := 2$. It would be nice if one could confim my guess.


Edit: In response to @kodlu's comment, here is the definition of $mod^\pm \, q$, as defined in the paper:

kodlu avatar
sa flag
Can you put the definition of $\mathrm{mod}^{\pm} q$ into the question?
Lee Seungwoo avatar
ke flag
I am not that confident about it but for the second question, $Decompress_q$ multiplies $q/2^{10}$ to input value, which is approximately 3.25 when $q=3329$, and then round it. Thus if we $Decompress_q$ every elements of $Z_{2^{10}}$, we get $\{0, 3, 7, 10, 13, \cdots \}$. The difference between every two (adjacent) elements is either 3 or 4 in this set.
P_Gate avatar
mq flag
Hi @LeeSeungwoo, yes I would agree with that :) I have determined similar results via a script. That is, a manual calculation simply shows this here so no need for a big proof. I have also readjusted my guess for the values from the second question.
Score:1
ru flag

I like to think of the the compression and decompression function operating on the numerators of fractions in the range $[0,1)$ (wrapped around) whose denominators are either $2^d$ or $q$. The compress function maps a numerator whose denominator is $q$ to the numerator of the nearest fraction whose denominator is $2^d$ and the decompress functions maps a numerator whose denominator is $2^d$ to the numerator of the nearest fraction whose denominator is $q$. Note that because $2^d<q$ compress is surjective and decompress is injective.

Between each pair of fractions with denominator $2^d$ there are either $[q/2^d]$ or $[q/2^d]+1$ fractions with denominator $q$ (more precisely, there will be $q\mod{2^d}$ pairs of the first kind and $2^d-q\mod{2^d}$ of the second kind. Similarly, by taking the halfway point between each pair of fractions with denominator $2^d$ we divide the wrapped interval $[0,1)$ into $2^d$ subintervals where the compress function is constant on the subinterval. Again, the number of fractions with denominator $q$ in each subinterval is either $[q/2^d]$ or $[q/2^d]+1$. Now note that the pair of integers $[q/2^d]$ and $[q/2^d]+1$ are either $2B_q$ and $2B_q+1$ or $2B_q-1$ and $2B_q$.

In the case with $2B_q+1$ fractions with denominator $q$, we have a central fraction that is closest to the fraction with denominator $2^d$ in the middle of the subinterval. The numerator of this central fraction is fixed under $\mathrm{decompress}(\mathrm{compress}(x))$ and corresponds to $x'-x=0$. Each of the $B_q$ fractions with denominator $q$ to the left of the central fraction in the subinterval has corresponds to a $x'-x$ value $-B_q,-B_q+1,\ldots -1$ and similarly the $B_q$ fractions with denominator $q$ correspond to $x'-x$ values $1,2,\ldots, B_q$. Note that this case corresponds to complete uniformity of all $x'-x$ values.

In the case with $2B_q-1$ fractions with denominator $q$, we again have a central fraction that is closest to the fraction with denominator $2^d$ in the middle of the subinterval. The numerator of this central fraction is fixed under $\mathrm{decompress}(\mathrm{compress}(x))$ and corresponds to $x'-x=0$. Each of the $B_q-1$ fractions with denominator $q$ to the left of the central fraction in the subinterval has corresponds to a $x'-x$ value $-B_q+1,-B_q+2,\ldots -1$ and similarly the $B_q$ fractions with denominator $q$ correspond to $x'-x$ values $1,2,\ldots, B_q-1$. Note that this case corresponds to uniformity of $x'-x$ values not including $\pm B_q$.

In the case where there are $2B_q$ fractions with denominator $q$, there is a ``central'' fraction that is nearest to the fraction with denominator $2^d$ in the subinterval, but an imbalance that means either there are $B_q-1$ fractions to the left and $B_q$ fractions to the right or vice-versa. The former corresponds to uniformity of $x'-x$ values not including $-B_q$ and the latter to uniformity of $x'-x$ values not including $B_q$.

In all cases we have uniformity for $x'-x$ values not including $x'-x$; in the case $2B_q-1$ and $2B_q$ we have a smaller weight for some integers of magnitude $B_q$. Not all cases can fall into the $2B_q+1$ because $q\mod 2^d\neq 0$. This explains the remark in question 1.

For question 2 note that in Kyber $q=3329$ where $B_q=2$, $[q/2^d]=3$ and we have $q\mod 2^d=257$ subintervals with $[q/2^d]=3$ fractions of denominator $q$ and $767$ subintervals with $[q/2^d]+1=4$ fractions of denominator $q$. The subintervals with 3 fractions are the $2B_q-1$ case and as we expect we have equal representation of values $\{-B_q+1,\ldots,B_q-1\}=\{-1,0,1\}$. The subintervals with 4 fractions are the $2B_q$ case and can be imbalanced either to the left (in which case we see- values $\{-B_q,\ldots, B_q-1\}=\{-2,-1,0,1\}$) or to the right (in which case we see- values $\{-B_q+1,\ldots, B_q\}=\{-1,0,1,2\}$. This explains the remarks behind question 2.

P_Gate avatar
mq flag
Thanks for the great answer! You say Compress is injective. But Compress takes values from $\left\{0,...,q-1\right\}$ and maps them to $\left\{0,...,2^d -1\right\}$ a much smaller set. Therefore, I would argue that Compress is surjective. Because two different pre-images do not have two different images.
Daniel S avatar
ru flag
@P_Gate Apologies, I transposed injective and surjective; I've now corrected.
P_Gate avatar
mq flag
Can you maybe name what you mean by "pair of integers" in the second paragraph?
Daniel S avatar
ru flag
I mean the two integers $[q/2^d]$ and $[q/2^d]+1$.
P_Gate avatar
mq flag
I created a room where I wanted to ask a few more questions. I would be happy if you would join. See https://chat.stackexchange.com/rooms/145870
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.