Score:3

No Final subtraction in Word-level Montgomery Multiplication

tr flag

I am trying to make an RSA module in VHDL, which in turn will be deployed to an FPGA. I am trying to implement a full Montgomery algorithm which means that I am working with the Montgomery Exponetiation algorithm, and the Montgomery Multiplication algorithm. Mostly my tests consists of generating random numbers (keys, modulus, r, messages) that I use to perform encryption/decryption. If the original message equals the decrypted message, the test has passed.

First I made some high level code of the bit-level version of Montgomery Multiplication in Python. While trying to get rid of the final subtraction I found this post by Brett on Stack Overflow. He also made a post here about the same thing. This helped me get the bit-level algorithm to work in Python with no final subtraction. Further it seemed to work in VHDL when performing simulations.

Next was to get the word-level algorithm to work. Reading the document c03gfp by Çetin Kaya Koç found here, I managed to get the high level Python code of the word-level algorithm to work, but not as I hoped as I had to keep the final subtraction. Using the same R that was used in the bit-level code did not work for the word-level code. Honestly I am not sure if Brett is asking about the same thing here, but he might be.

I have tried to find documents describing how this should be done mainly by googling, but I have had no luck in finding any. If I did find any that actually could help me get rid of the final subtraction in the word-level algorithm, I did not understand the content of the documents in that case.

I got an idea that I could try to increase the R more than what was done when using the Walter optimization for the bit-level algorithm. Considering that I am working with the word-level algorithm, why not have R be one word wider than the modulus?

For the bit-level algorithm I set R = 2^(2 * NUM_BITS) mod n as suggested by Renaud Pacalet, where NUM_BITS = 256 + 2 in my case as I was working with 256-bit numbers.

For the word-level algorithm on the other hand I tried setting R = 2^(2 * (NUM_BITS + WORD_WIDTH)) mod n. In my case NUM_BITS = 256 and WORD_WIDTH = 32, but WORD_WIDTH could also be 8, 16, 64, or 128 for instance (note that WORD_WIDTH $\leq$ NUM_BITS). This did work, and has worked for all the tests that I have performed in Python with random numbers. I have yet to implement this in VHDL as I would like to verify if this is correct before doing so, or at least have some more basis for doing this.

So my question is if it is correct to set R = 2^(2 * (NUM_BITS + WORD_WIDTH)) mod n for the word-level Montgomery Multiplication as I have done now. Also, is there any documents that describes this?

EDIT 1

It seems that after reformulating my question as I was googling around, I stumbled upon this post by Saf. The answer provided by Thomas Pornin seems to almost answer my question, as my question fundamentally was the same question as the one Saf asked.

Thomas Pornin wrote that if the modulus $n$ has a size of NUM_BITS, then I should have looked for the next multiple of WORD_WIDTH.

Repeating Thomas Pornins example where WORD_WIDTH = w = 32: For a 1024-bit modulus, which is already a multiple of 32, you use $R=2^{1024}$. For a 1025-bit modulus, you would use $R=2^{1056}$.

This does not seem to be the case for me unless I have misunderstood something. Through my testing with NUM_BITS = 256 and WORD_WIDTH = 32 I have found that setting $R=2^{256}$ might fail even though $n$ is a 256 bit number, and even if $n$ is a 255-bit number. On the other hand, the tests are always successful when $R=2^{256 + 32\cdot c}$, where $c$ is a positive integer. I have only tested with $c = [1,2,3]$, so I can not guarantee that all positive numbers will work. One should also consider the quality of my tests, and how many I have performed ($+1000000$).

The follwoing is also from Thomas Pornins answer to Safs question: The case is that Montgomery multiplication makes sense only with word sizes. With $w$ as the word size, then $R=2^{kw}$ for some integer k, which should be choosen as small as possible given that $R\ge N$.

To me it seems like $R > N$ should be the case. Any comments on this?

pe flag
Section 2.4 of [Montgomery Arithmetic from a Software Perspective](https://eprint.iacr.org/2017/1057) has a pretty clear summary of this issue. Basically the requirement is $4\cdot N < R$.
TheJonaMr avatar
tr flag
Thank you, I have read that section and I found that having an R that is one word wider than N is correct. Quote from the section: The condition $4N < R$ results in a Montgomery-radix R which is represented using one additional computer word (compared to the number of words needed to represent the modulus N).
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.