Score:2 Crypto

# Carry-less multiplication vs. multiplication in \$GF(2^k)\$ I implemented carry-less multiplciation using the CLMUL instruction set. This is similarly fast to simple modulo multiplication. But computating the result mod some polynomial is still very slow. I do it this way:

``````for (unsigned int i = 32; i-- > 0; )
{
if (c & (1L << (i + 32)))
{
c ^= 1L << (i + 32);
c ^= (uint64_t)p << i;
}
}
``````

Where c is 64-bit carry-less product and p is some irreducible polynomial 32-bit. I'm not sure is that code correct. Can we do this faster?

If I'm right that we still have to do this rather expensive procedure after calculating the product, then I began to wonder if it would be reasonable to do carry-less multiplication alone just mod $$2^k$$. Does carry-less multiplication mod $$2^k$$ itself also have similar advantages as carry-less multiplication mod polynomial?

It could be constant time, but is it uniformly distributed? In general is carry-less multiplication reasonable alternative to multiplication in $$GF(2^k)$$?  Do you need side-channel security?  @kelalaka I'm not sure. I'm working on some PRNG based on multiplication. As an alternative for multiplication mod 2^n carry-less product seems to be good enough, especially that I'm truncating or mixing low bits. But this PRNG has cryptography potencial, since it works as non-invertible random mapping, can be parametrized by huge space of keys for independent streams. In this case it is better if it is side side-channel resistant.  Your aim is not totally clear. There are lots of methods to achieve. The first trick is select [low weight irreducible polynomial (\$p\$)](https://www.hpl.hp.com/techreports/98/HPL-98-135.pdf) as discussed [here](https://crypto.stackexchange.com/a/77958/18298)
Score:2 Crypto Firstly, your code is almost correct for multiplication in $$GF(2^{32})$$ provided that $$p$$ represents the bit coefficients for the monomials $$x^{31},x^{30},\ldots,x^2,x,1$$ in a degree 32 irreducible polynomial. There is a fencepost issue that $$i$$ should run from 31 down to 0.

Now, carryless multiplication mod $$2^k$$ does not correspond to multiplication in a field but instead the ring $$\mathbb Z[x]/x^k\mathbb Z[x]$$. This is not good mathematical object to do cryptography with. For example, the low bit of the output is only a function of the low bits of the inputs. By comparison multiplication in $$GF(2^k)$$ all of the output bits are a function of all of the input bits. Another property of field multiplication is that it is invertible for non-zero inputs, but in our ring the function is not invertible for inputs without the low bit set.

If we consider all pairs of inputs, then outputs are not uniformly distributed. For example only 25% of inputs will produce an output with the low bit set. If we fix one of the inputs and ensure that its low bit is set then outputs are uniformly distributed, but low bits of output will only depend on low bits of input. In short it is not a good alternative.

In terms of your earlier question, there are some possible speed ups. If you precompute the code for $$c$$ taking values 1<<32, 1<<33,..., 1<<63 and store these values as $$x,\ldots,x$$ then the code can be replaced with

``````for (int i = 31; i-- >= 0; )
{
if (c & (1L << (i + 32)))
c ^= x[i];
}
c %= 1<<32;
``````

If you want something faster still, you might want to look at alternative ways of representing fields such as optimal normal bases or Zech logarithms  Yes, p represents the bit coefficients for the monomials, like you wrote. So in that case it is correct, right?  By the way I read about Gueron and Kounavis algorithm, which introduced efficient way using pclmulqdq, for 128-bit mod reduction in GCM: https://www.sciencedirect.com/science/article/abs/pii/S002001901000092X. And now it is more clear to me - GF multiplication, especially without pclmulqdq is expensive compared to normal multiplication. Even with pclmulqdq it is expensive, it still requires for example 32 iterations for just mod polynomial in case of 32-bit numbers. That's why we are using some tricks to make it faster, like in Gueron and Kounavis algorithm, with special polynomial GF(2^128).  I sit in a Tesla and translated this thread with Ai: