Score:1

# Are RSA-KEM key exchange material cyphertexts indistinguishable from random noise?

First of all, I know that I should not be using RSA in 2023, and that I'm better off with Elligator2 + ECIES for a variety of reasons.

However, I am thinking about whether RSA-KEM can be used for PURB-like constructions with long lived public keys. For it to be viable, the RSA ciphertexts should not be distinguishable from random noise. I think that it can be formulated like this: given 2 sets of large number $$i$$ of bitstrings $$M_0$$ and $$M_1$$ of some size $$s$$, where one contains only RSA-KEM key materials, and the other contains uniformly random strings, the attacker should not be able to guess which is which with a probability of more then $$0.5$$.

I tried looking into literature for this, and I've seen Wikipedia (https://en.wikipedia.org/wiki/Ciphertext_indistinguishability) call this Indistinguishable from random noise, but I've not seen any papers on how to prove this property.

Any pointers to literature and help would be greatly appreciated.

Score:1

The RSA function is a permutation from `x` to `c=(x^e)%N`. Assuming the plaintext integer `x` is drawn from the domain `{0...N-1}` uniformly, the output distribution is also uniform ... because it's a permutation.

For simple encoding schemes like little endian or big-endian representations of the integer `c` you don't automatically get indistinguishability from a byte string unless the modulus is very slightly less than a power of `256` which most are not. An encoded ciphertext integer `c` is distinguishable from a random byte string because the resulting encoded integers won't span the full space of all `256^i` `i`-length byte strings.

This can be mitigated by sending `c+r*N` for a random `r`, `0≤r<a` where `a*N` is very close to a power of 256 (whole number of bytes). At the other side, just modulo by `N` to get `c`. There's still a little bias but not enough to be statistically detectable if `a` is close to or above `2^64`.

It's also possible to do rejection sampling where we generate random values for `p` until the resulting `c` fits into a smaller byte string than needed to fit the full range of possible `c` outputs. (IE:Generate a uniform distribution whose range exceeds our output range and rejection sample till we get a working value). The resulting output is also obviously uniformly distributed

Here's some python3 code implementing the first approach which I think is the better one.

``````p=174113967842783917835204255301763415428591748591953438692854480763643474182391763686902150854922956351250601961020106231291708310649875297462624643766368088850973264739552404123663483650690489277416561680223694714848159941231207341867561628622020661266316862572154326276326302526944588093717260012371330585681
q=158519437755145821754759779413548208438914193123396319777833933794657109899244315090585980006963132975609760015381737207338401607701588311663394760228367283720829992393451422145333636162426477538687144939925969774712334290868315085048710186340806489746538886556367531706127159071881402542824916421433947174003
N=p*q

byte_len=(N.bit_length()+7)//8 #round up to required size
#add 8 more bytes to give <(2**-64) bias
byte_len+=8
a=(256**byte_len)//N #a*N ~= 256**byte_len

import random
x=random.randrange(N)
c=pow(x,65537,N)
r=random.randrange(a)
encoded=(c+N*r).to_bytes(byte_len,"little")

decoded=int.from_bytes(encoded,"little")
assert c == decoded%N #this is how you recover `c`
``````

Notice that if you set `r=a-1` the resulting encoded string has `FFF...FFF` as its most significant bytes indicating that `r` is correctly spreading the input `c` across the entire output range.

This approach is essentially the opposite of one used to generate random integers in range `{0...N-1}` where we generate a large byte string with ~64 more bits than needed and take the remainder mod `N`. Here we have the remainder, generate a random dividend in range `{0...a-1}` and reverse the divmod calculation to get a random byte string.

Thank you! This makes a lot of sense. In your encoding scheme, are \$a\$ and \$n\$ just some random primes generated for this protocol so that \$an\$ is close to \${256}^x\$? How do you quantify "close" here?
The obvious way to get around the 'unless the modulus is very slightly less than a power of 256' is to select a modulus that is very slightly less than a power of 256. For example, you can select \$p\$ as normal, and then select a \$q\$ from the range \$[(256^n - 256^{n-32})/p, 256^n/p]\$. It is easy to see that (assuming you pick a \$p\$ circa \$16^n\$) that this yields an RSA modulus that is just as strong as any other modulus about the same size. The practical disadvantage is that (for example) you may get a \$p\$ of 2048 bits and a \$q\$ of 2049 bits, and your RSA implementation might not like that.
Yup, FIPS compliant libraries and hardware are only required to handle p and q each less than n/2 bits. Most software doesn't care but good luck if you have a smartcard.
`a` is not necessarily a prime, `a` is just the maximum number of `N` multiples we can fit in the byte string. See the included code for more details.
Thank you, I understand now!
I sit in a Tesla and translated this thread with Ai: