Score:2

Compression and Decompression in CRYSTALS-Kyber

mq flag

I am currently studying the Kyber Paper. I have a question about section 2.2 Compression and Decompression, but first I would like to quote the statement:

Compression and Decompression. We now define a function $\text{Compress}_q (x, d)$ that takes an element $x ∈ \mathbb{Z}_q$ and outputs an integer in $\{0,..., 2^d − 1\}$, where $d < \lceil\log_2(q) \rceil$. We furthermore define a function $\text{Decompress}_q$, such that $$x' =\text{Decompress}_q(\text{Compress}_q(x,d),d) \quad (1)$$ is an element close to $x$ – more specifically $|x'-x \text{ mod}^{\pm} q|\leq B_q := \lceil \frac{q}{2^{d+1}} \rfloor$. The functions satisfying these requirements are defined as: $$\text{Compress}_q(x,d)= \lceil (2^d / q) \cdot x \rfloor \text{ mod}^+ 2^d , \\ \text{Decompress}_q(x,d) = \lceil (q/2^d) \cdot x \rfloor$$ If $x'$ is a function of $x$ as in Eq. (1), then for a randomly chosen $x\leftarrow \mathbb{Z}_q$, the distribution of $$ x' - x \text{ mod}^\pm q$$ is almost uniform over the integers of magnitude at most $B_q$.

  • My first question is about the inequality $|x'-x \text{ mod}^{\pm} q|\leq B_q := \lceil \frac{q}{2^{d+1}} \rfloor$, how do one come up with this estimation in detail? I don't quite understand how you can say that the amount is smaller than the centered binomial distribution $B_q$.

  • My second question is about the definition of the Compress and Decompress functions (from "The functions satisfying these requirements..."). I don't see offhand now how these functions satisfy the requirement.

Score:2
ng flag

It is probably easiest to understand this without the scaling factor (the general technique works for an arbitrary lattice). Consider the simple lattice $L = \mathbb{Z}^n$. One can uniquely decompose $\mathbb{R}^n$ as $\mathbb{Z}^n + [-1/2, 1/2)^n$. This is done by taking a point $\vec x\in\mathbb{R}^n$ and separating it into its "integer part" (in $\mathbb{Z}^n$) and "fractional part" (in $[-1/2, 1/2)^n$). This is to say that one can compress $\vec x\in\mathbb{R}^n$ into a point in $\mathbb{Z}^n$, up to bounded error in $[-1/2, 1/2)^n$.

How does this have to do with "compression" though? It is clearly much easier to store an integer $\vec x\in\mathbb{Z}^n$ than a real number $\vec x\in\mathbb{R}^n$, but so far not in a way that can be made quantitative. The rest of this post is essentially going to be rephrasing things in a way that can be made quantitative.

The integers used in lattice cryptography are really not arbitrary integers $\mathbb{Z}$, but integers modulo $q$ $\mathbb{Z}/q\mathbb{Z}$. This is somewhat misleading though, and can lead to all sorts of misconceptions --- for example, when one writes $(q/p)$ for $p\nmid q$, one might think this means $q (p^{-1}\mod q)$, when it really should mean something like $\lfloor q/p\rceil$. This is just to say that the underlying mathematical situation is perhaps not as slick as one might hope/assume.

Anyway, the setup is that we are given $\vec x\in(\mathbb{Z}/q\mathbb{Z})^n$ that we view as being the subset $\{-q/2,-q/2+1\dots,q/2\}^n\subseteq\mathbb{Z}^n$. There are $q^n$ elements of this subset, so one can represent an element of this subset using $n\log_2 q$ bits.

To compress this, we can replace this subset with a sparser subset. For example, a very sparse subset we could use is $\{-q/2, 0, q/2\}^n$. In general one will be more careful with the endpoints of this subset and identify $-q/2$ and $q/2$, so this subset will really look like $\{0, q/2\}^n$. This gets us down to $n\log_2 2 = n$ bits. But it can introduce fairly large error, so doesn't work in many situations (quantifying why would require taking a detour to discuss Kyber's error correction).

Anyway, in both cases the subsets take the form of $L / q\mathbb{Z}^n$ for $L\supseteq q\mathbb{Z}^n$ a lattice that is periodic modulo $q$. One can equivalently write this as $L\cap [-q/2, q/2)^n$. In the first example the lattice is $L = \mathbb{Z}^n$. In the second example it is $(q/2)\mathbb{Z}^n$. One could choose $L$ to be "intermediate" between these two options, for example $(q/2^d)\mathbb{Z}^n$, as Kyber does. This lets you compress things from $n\log_2 q$ bits to $n\log_2 (q/2^d)$ bits, at the cost of introducing some error.

How do you compute how much error is introduced? You're solving CVP on the lattice $L := (q/2^d)\mathbb{Z}^n$. This means you're replacing $\vec x\in\mathbb{Z}^n$ with $\mathsf{CVP}_L(\vec x)$, which leads to error $\vec x - \mathsf{CVP}_L(\vec x)$. This error will be in the "Voronoi cell" of $L$, which you can compute is contained in $[-(q/2)(1/2^d), (q/2)(1/2^d))^n: = (1/2^d)[-q/2, q/2)^n$, i.e. is really just a scaled copy of the Voronoi cell of $q\mathbb{Z}^n$ (which is $[-q/2, q/2)^n$).

P_Gate avatar
mq flag
Thanks for your answer! The first paragraph is understandable! For the second I still have a question, why do you write $(q/2^d)\mathbb{R}^n = \mathbb{R}^n$? The equality is not quite clear to me here. And secondly how does it behave with Compression? Do you scale here with $2^d/q$?
Mark avatar
ng flag
@P_Gate I've tried rewriting it some to make it more precise
Score:2
ru flag

Firstly, understand that here $B_q$ does not represent the centred binomial distribution, but instead the integer value $\lceil\frac q{2^{d+1}}\rfloor$.

Next it might help to think of the compress and decompress functions as follows:

Consider the fraction $x/q$ and locate the nearest fraction with denominator $2^d$, suppose this is $c/2^d$ then $\mathrm{Compress}(x,d)=c\mod^+2^d$.

Consider the fraction $x/2^d$ and locate the nearest fraction with denominator $q$, suppose this is $b/q$ then $\mathrm{Decompress}(x,d)=b$.

Because $q>2^d$ each fraction with denominator $c/2^d$ is the centre of an interval $((2c-1)/2^{d+1},(2c+1)/2^{d+1})$ which contains either $\lfloor q/2^d\rfloor$ or $\lceil q/2^d\rceil$ fractions with denominator $q$ in it. The numerators of each of these fractions (and no other integers in $0,q-1)$) will map to $c$ under Compress, whereas $\mathrm{Decompress}(c,d)$ will map to a numerator "exactly" halfway through the list of numerators. The worst case is then when a we start from $x$ which is at an extreme end of the list of numerators. In this case the Compress then Decompress maps us atop a numerator halfway through the list of numerators and which is therefore at most $\lceil q/2^{d+1}\rfloor$ away from the point where we started.

P_Gate avatar
mq flag
Thanks for your answer! The third paragraph is not quite clear to me, can you explain further why this is so (I'm trying to visualize this somewhat graphically right now.). In the last paragraph for me is not clear how you came from the interval $((2c-1)/2^{d+1},(2c+1)/2^{d+1})$ to the conclusion, that it contains either $\lfloor q/2^d\rfloor$ or $\lceil q/2^d\rceil$ fractions with denominator $$ in it.
Daniel S avatar
ru flag
It's an interval of length $1/2^d$ and the numbers $[0,1)$ can be split into 2^d such buckets. We then have to evenly divide $q$ fractions of denominator $q$ into these buckets so that some have $\lfloor q/2^d\rfloor$ and some have $\lceil q/2^d\rceil$.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.