Score:1

Are RSA-KEM key exchange material cyphertexts indistinguishable from random noise?

ng flag

First of all, I know that I should not be using RSA in 2023, and that I'm better off with Elligator2 + ECIES for a variety of reasons.

However, I am thinking about whether RSA-KEM can be used for PURB-like constructions with long lived public keys. For it to be viable, the RSA ciphertexts should not be distinguishable from random noise. I think that it can be formulated like this: given 2 sets of large number $i$ of bitstrings $M_0$ and $M_1$ of some size $s$, where one contains only RSA-KEM key materials, and the other contains uniformly random strings, the attacker should not be able to guess which is which with a probability of more then $0.5$.

I tried looking into literature for this, and I've seen Wikipedia (https://en.wikipedia.org/wiki/Ciphertext_indistinguishability) call this Indistinguishable from random noise, but I've not seen any papers on how to prove this property.

Any pointers to literature and help would be greatly appreciated.

Score:1
mx flag

The RSA function is a permutation from x to c=(x^e)%N. Assuming the plaintext integer x is drawn from the domain {0...N-1} uniformly, the output distribution is also uniform ... because it's a permutation.

For simple encoding schemes like little endian or big-endian representations of the integer c you don't automatically get indistinguishability from a byte string unless the modulus is very slightly less than a power of 256 which most are not. An encoded ciphertext integer c is distinguishable from a random byte string because the resulting encoded integers won't span the full space of all 256^i i-length byte strings.

This can be mitigated by sending c+r*N for a random r, 0≤r<a where a*N is very close to a power of 256 (whole number of bytes). At the other side, just modulo by N to get c. There's still a little bias but not enough to be statistically detectable if a is close to or above 2^64.

It's also possible to do rejection sampling where we generate random values for p until the resulting c fits into a smaller byte string than needed to fit the full range of possible c outputs. (IE:Generate a uniform distribution whose range exceeds our output range and rejection sample till we get a working value). The resulting output is also obviously uniformly distributed

Here's some python3 code implementing the first approach which I think is the better one.

p=174113967842783917835204255301763415428591748591953438692854480763643474182391763686902150854922956351250601961020106231291708310649875297462624643766368088850973264739552404123663483650690489277416561680223694714848159941231207341867561628622020661266316862572154326276326302526944588093717260012371330585681
q=158519437755145821754759779413548208438914193123396319777833933794657109899244315090585980006963132975609760015381737207338401607701588311663394760228367283720829992393451422145333636162426477538687144939925969774712334290868315085048710186340806489746538886556367531706127159071881402542824916421433947174003
N=p*q


byte_len=(N.bit_length()+7)//8 #round up to required size
#add 8 more bytes to give <(2**-64) bias
byte_len+=8
a=(256**byte_len)//N #a*N ~= 256**byte_len

import random
x=random.randrange(N)
c=pow(x,65537,N)
r=random.randrange(a)
encoded=(c+N*r).to_bytes(byte_len,"little")

decoded=int.from_bytes(encoded,"little")
assert c == decoded%N #this is how you recover `c`

Notice that if you set r=a-1 the resulting encoded string has FFF...FFF as its most significant bytes indicating that r is correctly spreading the input c across the entire output range.

This approach is essentially the opposite of one used to generate random integers in range {0...N-1} where we generate a large byte string with ~64 more bits than needed and take the remainder mod N. Here we have the remainder, generate a random dividend in range {0...a-1} and reverse the divmod calculation to get a random byte string.

Gregory Khvatsky avatar
ng flag
Thank you! This makes a lot of sense. In your encoding scheme, are $a$ and $n$ just some random primes generated for this protocol so that $an$ is close to ${256}^x$? How do you quantify "close" here?
poncho avatar
my flag
The obvious way to get around the 'unless the modulus is very slightly less than a power of 256' is to select a modulus that is very slightly less than a power of 256. For example, you can select $p$ as normal, and then select a $q$ from the range $[(256^n - 256^{n-32})/p, 256^n/p]$. It is easy to see that (assuming you pick a $p$ circa $16^n$) that this yields an RSA modulus that is just as strong as any other modulus about the same size. The practical disadvantage is that (for example) you may get a $p$ of 2048 bits and a $q$ of 2049 bits, and your RSA implementation might not like that.
Richard Thiessen avatar
mx flag
Yup, FIPS compliant libraries and hardware are only required to handle p and q each less than n/2 bits. Most software doesn't care but good luck if you have a smartcard.
Richard Thiessen avatar
mx flag
`a` is not necessarily a prime, `a` is just the maximum number of `N` multiples we can fit in the byte string. See the included code for more details.
Gregory Khvatsky avatar
ng flag
Thank you, I understand now!
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.