The RSA function is a permutation from x
to c=(x^e)%N
. Assuming the plaintext integer x
is drawn from the domain {0...N-1}
uniformly, the output distribution is also uniform ... because it's a permutation.
For simple encoding schemes like little endian or big-endian representations of the integer c
you don't automatically get indistinguishability from a byte string unless the modulus is very slightly less than a power of 256
which most are not. An encoded ciphertext integer c
is distinguishable from a random byte string because the resulting encoded integers won't span the full space of all 256^i
i
-length byte strings.
This can be mitigated by sending c+r*N
for a random r
, 0≤r<a
where a*N
is very close to a power of 256 (whole number of bytes). At the other side, just modulo by N
to get c
. There's still a little bias but not enough to be statistically detectable if a
is close to or above 2^64
.
It's also possible to do rejection sampling where we generate random values for p
until the resulting c
fits into a smaller byte string than needed to fit the full range of possible c
outputs. (IE:Generate a uniform distribution whose range exceeds our output range and rejection sample till we get a working value). The resulting output is also obviously uniformly distributed
Here's some python3 code implementing the first approach which I think is the better one.
p=174113967842783917835204255301763415428591748591953438692854480763643474182391763686902150854922956351250601961020106231291708310649875297462624643766368088850973264739552404123663483650690489277416561680223694714848159941231207341867561628622020661266316862572154326276326302526944588093717260012371330585681
q=158519437755145821754759779413548208438914193123396319777833933794657109899244315090585980006963132975609760015381737207338401607701588311663394760228367283720829992393451422145333636162426477538687144939925969774712334290868315085048710186340806489746538886556367531706127159071881402542824916421433947174003
N=p*q
byte_len=(N.bit_length()+7)//8 #round up to required size
#add 8 more bytes to give <(2**-64) bias
byte_len+=8
a=(256**byte_len)//N #a*N ~= 256**byte_len
import random
x=random.randrange(N)
c=pow(x,65537,N)
r=random.randrange(a)
encoded=(c+N*r).to_bytes(byte_len,"little")
decoded=int.from_bytes(encoded,"little")
assert c == decoded%N #this is how you recover `c`
Notice that if you set r=a-1
the resulting encoded string has FFF...FFF
as its most significant bytes indicating that r
is correctly spreading the input c
across the entire output range.
This approach is essentially the opposite of one used to generate random integers in range {0...N-1}
where we generate a large byte string with ~64 more bits than needed and take the remainder mod N
. Here we have the remainder, generate a random dividend in range {0...a-1}
and reverse the divmod calculation to get a random byte string.