Score:1

Related to ntt implementation in crystal kyber

om flag

Recently I started implementing basic skeleton of crystal kyber. Right now I got stuck at NTT, where I am not able to understand how values has to be fed (twiddle factor, actual data which has to be converted into NTT form).I am referring ntthelper.py from python implementation and parametric ntt from HDL implementation .Here I was not able to figure out how that zetas(twiddle factor values) and k values are related with each other. And can we use zetas values present in ntthelper.py as twiddle factor values in hdl implementation?

Score:2
in flag

While not an answer to your question, a little background on DFT/FFT/NTT. If you understand these well, you can write your own implementation without too much trouble.

DFT/FFT

In a field $\mathbb{F}$ with $n$th roots of unity (say $\mathbb{C}$), we have an isomorphism $$ \mathbb{F}[x]/(x^n-1)\cong\prod_{i=0}^{n-1}\mathbb{F}[x]/(x-\zeta^i)\cong\mathbb{F}^n (\text{as rings!}), $$ basically the chinese remainder theorem (here $\zeta$ is a primitive $n$th root of unity). The isomorphism is specifically given by evaluation (left to right) $$ f(x)\mapsto(f(1),f(\zeta),\ldots,f(\zeta^{n-1})) $$ for some primitive $n$th root of unity $\zeta$ (say $\zeta=e^{2\pi i/n}$). From right to left, we interpolate $$ (z_0,\ldots,z_{n-1})\mapsto f \text{ s.t. } f(\zeta^i)=z_i. $$ This isomorphism is linear over $\mathbb{F}$, given by the matrices $$ (\zeta^{ij})_{i,j=0}^{n-1}= \left( \begin{array}{ccccc} 1&1&1&\cdots&1\\ 1&\zeta&\zeta^2&\cdots&\zeta^{n-1}\\ \vdots&\vdots&\vdots&\cdots&\vdots\\ 1&\zeta^{n-1}&\zeta^{2(n-1)}&\cdots&\zeta^{(n-1)(n-1)} \end{array} \right) $$ $$ \frac{1}{n}\left(\zeta^{-ji}\right)_{i,j=0}^{n-1}= \frac{1}{n}\left( \begin{array}{ccccc} 1&1&1&\cdots&1\\ 1&\zeta^{-1}&\zeta^{-2}&\cdots&\zeta^{-(n-1)}\\ \vdots&\vdots&\vdots&\cdots&\vdots\\ 1&\zeta^{-(n-1)}&\zeta^{-2(n-1)}&\cdots&\zeta^{-(n-1)(n-1)} \end{array} \right) $$ The FFT is (a pair of) algorithms to compute the DFT above quickly. The flavors are DIT (decimation in time, or Tukey-Cooley) and DIF (decimation in frequency, or Gentleman-Sande). You can think of these as nice factorizations of the above matrices or as recursive implementation of the matrix multiplication.

NTT

The NTT is a DFT over some other base field, say a finite field that has $n$th roots of unity, e.g. $\mathbb{F}_{8380417}$ has $512$th roots of unity (since the multiplicative group of a finite field of order $q$ is cyclic of order $q-1$ and is precisely the solutions to $x^{q-1}-1=0$ and $8380417-1=2^{13}\cdot3\cdot11\cdot31$). [For reference, this is the prime used in Dilithium.]

NTT in Kyber

Most lattice schemes work in $\mathbb{F}[x]/(x^n+1)$ for $n$ a power of 2. [Note that this is using half the roots of unity discussed above $x^{2^{k+1}}-1=(x^{2^k}+1)(x^{2^k}-1)$ and that $x^{2^k}+1$ is irreducible over $\mathbb{Q}$.] This doesn't change things all that much; basically replace $\zeta^i$ above with $\zeta^{2i+1}$.

To allow the use of a smaller modulus in Kyber ($q=3329$), we end up with a field that doesn't have $512$th roots of unity, but does have $256$th roots of unity, so that the chinese remainder theorem looks like $$ \mathbb{F}_{3329}[x]/(x^{256}+1)\cong\prod_{i=0}^{255}\mathbb{F}_{3329}[x]/(x^2-\zeta^{2i+1}), $$ again given by evaluation/interpolation. In other words a polynomial $f$ given by coefficients $f(x)=\sum_{i=0}^{255}f_ix^i$ is transformed into a list $$ \hat{f}=(\hat{f}_{2i}+x\hat{f}_{2i+1})_{i=0}^{127}. $$ To make a long story shorter, doing this evaluation quickly boils down to doing two length 128 FFTs.

One final complication with Kyber is the ordering of the roots of unity/indices in the NTT, namely $$ \hat{f}=(\hat{f}_{2\mathsf{br}(i)}+x\hat{f}_{2\mathsf{br}(i)+1})_{i=0}^{127} $$ where $\mathsf{br}(i)$ is the 7-bit bit reversal of the integer $0\leq i<127$, e.g. $$ 112_{10}=1110000_2\xrightarrow{\mathsf{br}}0000111_2=7_{10}. $$ The reason for this is that DIT/DIF naturally reverse the bit-order of the indices and the implementors left out the extra work to (un)do this (or something about AVX instructions if you read the specs).

Exercises

  • Work out/verify DFT.
  • Show how DFT for $x^n+1$ and $x^{2n}-1$ are related.
  • Work out DIT/DIF recursions.
  • Show how to implement Kyber NTT with two length 128 DFTs.
Mark avatar
ng flag
It's worth mentioning as well that even if you don't have roots of unity of high enough order to get full isomorphisms $\mathbb{F}[x] / (x^{2^k}+1) \cong \prod_{i\in[2^k]} \mathbb{F}$, you can often recurse down most (but not all) of the way, i.e. $\mathbb{F}[x] / (f(x)) \cong \prod_{i\in[n/k]} \mathbb{F}[x] / (g(x))$ where $\deg f = n$, $\deg g = k$. If $k$ is small arithmetic does not get too much slower. This is not directly relevant for Kyber --- I just like mentioning this as it feels like people overfixate on the fully split case, which introduces strong constraints on $q$
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.