Score:2

Multiplication implemented in c++ with constant time

tf flag
Tom

I'm considering some non-cryptographic PRNG which uses multiplication of two 64-bit or 128-bit random numbers at some point.

__uint128_t a;
__uint128_t b;

__uint128_t result;

result = a * b;

Is this constant time? I don't think so, especially since it takes less time to multiply two small numbers than when they are large numbers. Is there any way to implement this in constant time?

Here someone wrote that multiplication itself on most common architectures will be constant:

https://stackoverflow.com/questions/17909343/is-multiplication-of-two-numbers-a-constant-time-algorithm

This is not in line with my experiments, in which if I multiply two random 128-bit numbers, but one is smaller than 2^64, it is faster, than multiplication of two numbers close to 2^128.

Peter Cordes avatar
us flag
If the compiler knows that one or both of the inputs was zero-extended from 64-bit or less, then yes it can make faster asm. But it's not going to branch to check that at run-time, so the speed for all inputs is set in stone at compile time, for any given instance of this code being inlined somewhere. (Assuming constant-time hardware multipliers, which has been the case for mainstream desktop CPUs for a long time.)
kelalaka avatar
in flag
This is a good answer for you to start with [How can I understand whether my C implementation is constant-time or not (i.e. resistant to timing attacks)](https://crypto.stackexchange.com/a/96634/18298)
Score:9
my flag

Is this constant time?

The answre to that would be quite compiler and CPU dependent.

Is there any way to implement this in constant time?

Given a reasonable set of constant time operations (such as additions, logical operations, shifts by constant amounts), yes, however such a construction would likely be much more expensive than what the compiler would give you.

On the other hand, why do you care? You explicitly said that these RNG's were not cryptographically secure; that is, we don't care whether an intelligent adversary monitoring the output could predict future outputs (and presumably that would include adversaries that could count cycles).

This is not in line with my experiments

As I said, that's compiler and CPU dependent.

Tom avatar
tf flag
Tom
I care, beause I'm wondering wether this scheme can be used to create CS-PRNG.
Alnitak avatar
fr flag
@Tom that contradicts what you said in the first line of your question.
poncho avatar
my flag
@Tom: well, I'm pretty sure it could be part of a PRNG; however there are things to be careful about with that operation. Multiplication modulo a power of 2 doesn't have any right-ward propagation; that is, the higher order bits of the inputs never affect the lower bits of the result. That's something that would need to be worked around in any such PRNG.
Tom avatar
tf flag
Tom
@poncho yes, I'm aware of that problems, multiplication also doesn't give results of equal probability - I know all of that. I'm considering really some bactracking resistance properties of non-invertible pseudorandom mappings of that PRNG. Some really specific things - in general multiplication is not the best choice for CS-PRNG.
Score:3
in flag

A previous answer correctly says it's compiler and CPU dependent. But in reality this is not an operation I would worry about. On a modern computer with a modern compiler, I would be shocked to see a 128 bit multiplication be anything except constant time.

Modern CPUs will implement at least 64 bit multiplication in constant time. Extending to 128 will be most efficient without a loop. Even if implementing with 32 bit operations, I would expect it best to not loop and not terminate early with smaller numbers.

Also, note your code multiplies two 128 bit numbers but puts the result in a 128 bit destination, which can be an issue.

A more common approach would be multiplying 64 bit integers and putting results in 128 bit destination.

Here is an example and explanation of what it might look like in assembly: https://stackoverflow.com/questions/33789230/how-does-this-128-bit-integer-multiplication-work-in-assembly-x86-64

You can see it will be constant time, and you can guess about the time by looking at: https://www.agner.org/optimize/instruction_tables.pdf (but due to out of order executions, actual runtime of everything is not simple addition).

fgrieu avatar
ng flag
See [this](https://bearssl.org/ctmul.html) for how there are problems for at least some modern _embedded_ CPUs. An example is the ARM Cortex M3, which cycle time for 32x32->64 multiplication depends on how many of the most significant bytes of one of the operands are all-zero. Quoting the [manual](https://developer.arm.com/documentation/ddi0337/h/programmers-model/instruction-set-summary/cortex-m3-instructions?lang=en): "UMULL, SMULL, UMLAL, and SMLAL instructions use early termination depending on the size of the source values". It's plausible code for 128x128 has more such optimizations.
Peter Cordes avatar
us flag
@fgrieu-onstrike: GCC only supports `__uint128_t` on 64-bit CPUs, but yes, in future with C23 `_BitInt(128) a,b` support we could have compilers making code for 128x128 => 128-bit on 32-bit CPUs like M3. Or on low-power RISC-V RV64 CPUs with similar low-performance small-area multipliers.
Score:1
kz flag

There's a way to find out: Measure it. Take x = largest 64 bit number, y = largest 128 bit number, and multiply 0, 1, x and y by 0, 1, x and y one billion times each and measure the time. Obviously making sure that the compiler cannot figure out ahead what the numbers are.

If your processor has a 64 x 64 bit product instruction with 128 bit result (quite typical nowadays), chances are that the compiler's code produces three 64 x 64 multiplications, without checking if any operand fit into 64 bits. And then it depends on the hardware, whether multiplying 0 * x or x * 0 is faster than x * x; very often it is not on a modern processor. That would include 64 bit x86 and ARM processors.

Note: 128 / 128 bit division will be different, because it would be very hard to implement efficiently with one fixed sequence of operations.

poncho avatar
my flag
This would be a relevant test if the OP was worried only about his machine; if he is constructing a general PRNG which may run on a variety of machines, testing only one may have limited value
prosfilaes avatar
in flag
@poncho It seems impossible to check every case; if the code was running under an emulator with a JIT compiler, no matter what you do, it could compile different cases to different code. Same thing could happen with a sufficiently smart compiler, or in some cases a sufficiently smart (or, from the discussion above, dumb) CPU.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.