Score:0

Crypto

Symmetric encryption algorithm based on multiplication

Tom

9/9/23, 1:39 PM

I've been wondering about this paragraph for some time:

Multiplication is a great mixing function. If you work out what multiplication looks like in terms of ANDs and XORs it becomes apparent how elaborate a 64bit multiply is. The amount of transistors required to implement it in hardware prohibits multiplication from being used in most cryptographic algorithms. But for non-cryptographic PRNGs which only need to run on a general purpose CPU, multiplication is very useful because there is already a hardware implementation.

https://tom-kaitchuck.medium.com/designing-a-new-prng-1c4ffd27124d

Usually in encryption algorithms we use modular addition, rotation, and exclusive OR operations. But is there anything that could stand in the way of using modular multiplication, rotation, and exclusive OR operations?

Modular multuiplication is slower than addition, but it's probably not that much slower, and for sure it is stronger mixing function. Multiplication is in fact a great mixing function, so why it is so rarely used in symmetric cryptography? I think even smarphones can do 64-bit multiplication very quickly and have some hardware implementation for multiplication (but I'm not sure).

Is the slowness of multiplication really such a big problem that multiplication can't find widespread use in fast lightweight encryption algorithms? Probably on IOT devices or RFID chips it can be a problem, but when it comes to computers and smartphones, an encryption algorithm based on multiplication couldn't be a problem, isn't it?

96

0 + 0

symmetric

cpu

phantomcraft

9/9/23, 2:11 PM

Unless the CPU has a integer hardware multiplier it shouldn't be a problem. The problem starts when using in a very small environment like smartcards when the tiny processor only do very basic operations as the ones you cited,

1

Reply

Score:5

Crypto

poncho

9/9/23, 3:44 PM

Is the slowness of multiplication really such a big problem that multiplication can't find widespread use in fast lightweight encryption algorithms? Probably on IOT devices or RFID chips it can be a problem, but when it comes to computers and smartphones, an encryption algorithm based on multiplication couldn't be a problem, isn't it?

Part of the issue appears to be the definition of 'lightweight', and the intended platforms it is targeted. The CPUs on smartphones are actually quite capable; I would not characterize those platforms (or laptop computers) as necessarily 'lightweight'. Lightweight crypto is generally designed with microcontrollers in mind; typically, those microcontrollers don't have built-in 64x64 bit multiplication instructions.

Now, modular multiplication (for modulus a power of 2) can be implemented by a series of shifts and conditional additions; certainly doable, but considerably more expensive than an addition operation.

The other issue would appear to be that modular multiplication isn't as wonderful as you would have hoped. For this discussion, I'll limit my discussion to multiplication modulo a power of 2 (multiple modulo a prime doesn't have these issues; they do have have issues around the range not being a power of 2).

Modular multication does not have any 'right-word' propagation; for example, flipping the high bit of one of the inputs would only affect the high bit of the output; the other bits are unaffected. Of course, modular addition has the same issue; however it's also cheaper.
Modular multiplication does have strong differentials; the strongest is based around the identity $(-x)*y = -(x*y)$ (and the modulus operation does not break this up).

Both of these issues can be designed around in a proper design; however the fact that you have to do so makes it less attractive. In addition, it begs the question: why not use multiplication in $GF(2^k)$ instead? If we're doing a shift/add implementation, a double/xor implementation of Galois multiplication isn't much more expensive, and it avoids the above two issues...

0 + 0

Tom

9/9/23, 4:59 PM

Of course multiplication mod 2^n has big issues with mixing the bits (especially low), that's why we combined it with xor and rotate. But so far as I know addition has even bigger issues of this type. I'm not sure but probably even stronger differentials.

0

Reply

Tom

9/9/23, 5:15 PM

Can multiplication GF(2^k) be similarly fast as multiplication 64-bit numbers mod 2^64? I think it has to be GF(2^64) to get the same amount of bits.

0

Reply

poncho

9/9/23, 5:47 PM

@Tom: obviously, any such answer would be quite hardware-dependent. On large CPUs, I believe that they have 'carryless multiply' operations; the necessary fixup isn't free, so it'll be slower than a straight 64x64 multiply, but it wouldn't be bad. On small CPUs (without a multiply), a constant-time double/conditional xor $GF(2^{64})$ should be close to the analogous shift/conditional add 64x64 multiplication...

0

Reply

Tom

9/11/23, 4:55 AM

I made 32-bit LCG generator and LCG generator in GF(2^32). And multiplication in GF(2^32) perfmorms slightly worse in PractRand. It looks like there are also some problems with low bits. Won't there be some other differentials? I'm not sure. It seems that the multiplication in GF(2) does not mix the bits any better. Or maybe the main advantage is that it's harder to find multiplicative inverse?

0

Reply

poncho

9/11/23, 12:44 PM

@Tom: the advantage of GF multiplication is this: if $a \ne 0$ and $b$ is unknown and uniformly distributed, then $a \times b$ will also be uniformly distributed. This is not true for multiplication modulo a power of 2, if $a = 2$, and $a \times b$ will always be even.

1

Reply

Score:4

Crypto

garfunkel

9/9/23, 3:49 PM

The block cipher IDEA from 1991 used modular multiplication mod $2^{16}+1$ for diffusion (where 0 is mapped to $2^{16}$).

As zero-divisors are not good from a cryptological standpoint, the modulus should be prime, and of the form $2^b+1$ as one prefers to work with bits (and not use the 0), so $b=2, 4, 8, 16$ ($b=1$ would be linear).

If you design a cipher using these modular multiplications, you will run into (at least) two problems:

the cryptographic properties of the modular multiplications are not well understood, making it hard for you to show that your cipher is good
for smaller devices side-channel attacks have to be considered, but it's hard to protect these modular multiplications against those (especially againt DPA; but already timing attacks might be a problem, if multiplication is not constant time)

0 + 0

poncho

9/9/23, 4:17 PM

I would argue that the cryptographical properties of modular multiplication are understood. In addition, as far as DPA, modular multiplication is threshold-friendly, based on the identity $(A+B)*(C+D) = (A*C + B*D) + (A*D + B*C)$ (which extends in the obvious way for 3+-way thresholding)

0

Reply

Tom

9/9/23, 5:07 PM

Side-channel attacks and timing attacks are really issue that has to be considered, I forgot about it. Hovewer, isn't addition similarly vulnerable to such a problems? Or maybe it isn't because we can easily transform it into shift and xors, which are constant time?

0

Reply

poncho

9/9/23, 5:56 PM

@Tom: the main problem with timing and multiply is on lower-end platforms, on those platforms, the multiply instruction might not be constant time (e.g. stop when one of the operands runs out of '1' bits). Now, you could implement a constant-time multiply on those platforms; however a crypto-naïve implementor might not. In contrast, CPUs almost always implement addition in constant time...

1

Reply

Tom

9/9/23, 6:52 PM

One more question. To get analog of 64-bit multiplication (mod 2^64), what GF should be used? I want to multiply 64-bit numbers and get 64-bit results. Am I right I have to use GF(2^64)? There are examples of GF(2^8) everywhere, AES is also operating on GF(2^8), but it works on bytes. If someone would like to replace 64-bit multiplication, he has to use GF(2^64), am I right? Or maybe it could be something different, like in AES GF(2^8), but on bytes? Then rules of such arithmetic are slighty different, than in GF(2^64), if I understand it right.

0

Reply

garfunkel

9/20/23, 12:19 PM

@poncho: The formulas you gave for threshold implementations hide two implementation issues: The addition is mod $2^{16}+1$, so the additive shares don't fit into 16 bits, and you need a modular reduction that is easily implemented, but you have to consider that the un-reduced values are visible to the attacker (via DPA). But the real problems for a DPA-secure implementation will show up, when you mix the modular multiplication with other operations like xor or addition mod $2^{16}$, as you have to switch between different masking operations, which is non-trivial (and time-consuming).

0

Reply

garfunkel

9/20/23, 12:22 PM

@Tom: You know that the multiplication in the field with $2^{64}$ elements is quite different from multiplication mod $2^{64}$?

0

Reply

Tom

9/20/23, 5:16 PM

@garfunkel yes, this is different, I know. I studied the subject and wrote my own multiplication programs and even a GF(2^128) multiplication function based on GCM mode and pclmulqdq instructions.

0

Reply

garfunkel

9/23/23, 1:09 PM

@Tom: (From your writing I thought you know, and asked just to be sure.) Two additional differences between using multiplication mod $2^{16}+1$ (using all field elements but the 0) and multiplication in the field with $2^{16}$ (using all field elements including the 0) is that the latter has always the 0 as fixed point, and - much more important - that the latter is linear over F2 ("xor"), so misses one essential property of cryptographic S-Box. So like in AES you might have to use inversion over the field with $2^{64}$ elements instead.

0

Reply

Score:0

Crypto

b degnan

9/9/23, 6:35 PM

Encryption, of any type, is costly in hardware area and power. Multipliers are indeed large, and deep and thereby slow compared to the base ALU full adder. I'm going to do a bunch of hand waving, but if you grab a CPU architecture book, you'll find the following: The design of the multipliers are always different, but the MUL unit pipeline for a 64-bit result is 4-deep, and for a 128-bit result is 8-deep on most CPUs. This is because you use chained full adders blocks, and the maximum depth in the 4-deep design is 16. The full-adder depth for an XOR, ROL, or ADD is equivalent to a single full-adder. Super-scaler processors hide much of the delays of multipliers; however, if you are really grinding away on a problem, will find an empty pipeline with a lot of delays.

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Symmetric encryption algorithm based on multiplication

TH: อัลกอริทึมการเข้ารหัสแบบสมมาตรขึ้นอยู่กับการคูณ

RO: Algoritm de criptare simetric bazat pe multiplicare

RU: Алгоритм симметричного шифрования на основе умножения

VI: Thuật toán mã hóa đối xứng dựa trên phép nhân

Symmetric encryption algorithm based on multiplication

Post an answer