Score:3

RSA: The algorithm aside, how are we turning a string into an int and vice versa?

cn flag

Let's say that I want to encrypt the file plain.txt. The very first step is actually to turn the contents of that file (lets say it only contains the string "Hello") into an int. I see python codes like this:

from Crypto.Util.number import bytes_to_long
with open('plain.txt', 'rb') as f:
    flag = f.read()

m = bytes_to_long(flag)

However, I don't quite understand what is going on. Furthermore, when the ciphertext has been decoded back into the plaintext but still in number form, I don't see long_to_bytes or anything to convert the number into string. I see

import binascii
binascii.unhexlify('{:x}'.format(m))

Which looks completely different from the other code, but it still works. Can someone explain these processes to me so that I understand the inputs and outputs of an encoding algorithm and not just the algorithm itself.

kelalaka avatar
in flag
Welcome to Cryptography.SE. This is encoding decoding problem and is off-topic here. Besides, you are doing it wrongly. One should decode as the exact opposite of the applied encoding,
fgrieu avatar
ng flag
One common bytes-to-int conversion is [OS2IP in PKCS#1](https://pkcs1.grieu.fr#page=9). Simplifying the description slightly, that considers bytes as digits in base 256 (that is as integers from 0 to 255, much like ordinary digits in base 10 are from 0 to 9), then converts $n$ bytes $X_1,X_2…X_n$ (from first/left to last/right) to integer $\displaystyle\sum_{1≤i≤n} 256^{n-i}X_i$. Notice that seldom is used for _text_ outside of CTF contexts. Instead it's used [hybrid encryption](https://en.wikipedia.org/wiki/Hybrid_cryptosystem).
cn flag
An improvement to the question would be to remove the word RSA from the question. The actual content has nothing to do with it. And in practice, RSA is almost never used directly to encrypt data, but hybrid encryption is used.
Maarten Bodewes avatar
in flag
I've tried to answer how this kind of thing normally works. If you're talking about textbook RSA then generally one or more bytes are turned into an integer between zero and the modulus, and then modular exponentiation is applied. Decoding means splitting the ciphertext into blocks of the same size as the modulus, decrypting and then decoding back. There is too little detail in the question to indicate more than this.
Score:7
in flag

Basically there are two steps to this:

  1. encode the text to bytes - this generally requires a character encoding such as UTF-8 or Latin encoding;
  2. encode the bytes to integer - this is part of the encryption operation in RSA as specified in PKCS#1, and is performed using a function called OS2IP.

In your case the text is obviously already encoded as bytes; files consist of bytes after all, and you are opening the file as a binary file (the b in the rb flag).

OS2IP means octet string to integer primitive. An octet string is nothing more than a byte array. If the bytes are already in the right form then it is just a question of interpreting the bytes as a number, as the computer always handles everything as binary anyway.

In PKCS#1 based RSA OS2IP is not used directly though: first a security relevant padding is applied. This would be either the PKCS#1 v1.5 defined padding or OAEP padding. Adding the padding means a not-insignificant amount of overhead is added before the message is applied; the amount of plaintext is much smaller than the RSA modulus.


This is one reason why files are generally not encrypted using RSA directly. The main other reason is that RSA encryption and especially decryption operations are very inefficient compared to e.g. AES-based encryption. Instead we use a protocol such as PGP which performs hybrid encryption. RSA in a secure mode of operation has a certain overhead and a maximum per operation, so generally a symmetric key is encrypted or derived using RSA instead; this symmetric key is then used to encrypt the data. Symmetric ciphers such as AES directly operate on binary data, so besides handling of the IV and padding, the data can be encrypted directly without conversion.

us flag
Guess, OS2IP assumes length limits for one-by-one conversion of octet-strings, so coming from a byte stream in step 2., OS2IP is preceded by fragmenting to packets, where the packet size depends on key length and padding mode.
Maarten Bodewes avatar
in flag
No, OS2IP is generally not preceded by fragmenting to packets; for RSA you don't use CBC mode or even ECB mode, you just encrypt a message of one size. I've added some details to the answer.
us flag
Understand. That's why certain libraries, e.g. JCE do not even reflect the maximum size you can pass to the encryption method, because you don't need it.
Maarten Bodewes avatar
in flag
Meh, the JCA doesn't return all that much meta information anyway. I'm not sure why that is, it's easy enough to calculate the overhead compared to the modulus, e.g. just 11 bytes (for minimum security) for PKCSv1 1.5 and well, [it differs for OAEP](https://crypto.stackexchange.com/a/42100/1172).
us flag
Oops, see two of my comments censored. Take this as confirmation. Yes, the easiest way to find the maximum payload is to put a modulus-length message and parse the exception raised, correctly according PKCSv1/OS2IP.
cn flag
I would not recommend PKCS#1 v1.5 padding, due to the various Bleichenbacher attacks (the ROBOT attack is the most recent one I am aware of). Version 1.5 was released, and deprecated (by version 2.0) all the same year, in 1998, due to this padding being broken in that same year. Even if it's still in practical use, this padding should usually come with a warning like "never use this, not for encryption, key exchange or signatures".
us flag
@tylo OS2IP does not touch padding. Concerning Bleichenbacher, consider the idea that some bogus padding with constants in the output was set intentionally.
cn flag
@SamGinrich I didn't imply, that it was. The answer just states, that according to PKCS#1 specification, the V1.5 padding is used. And in my opinion, this should always come with a warning like "don't use this". Even if some specific use case, the V1.5 padding would be fine to use - that would have to be proven, requiring in-depth knowledge of the topic and is a potential for human error. The advise with minimal overall risk is to not use / allow that padding at all.
us flag
@tylo Agree with your advice. There is a community concerned with security, where this aspect should be priority 1. Here in Cryptography, I expect a discussion about an incomplete concept of an algorithm and without goodthink. Though, I am confessed with details of certain algorithm variants, I would not expect you to take any advice from me, without delivering a traceable explanation or at least a reference to such.
Score:-2
us flag

EDIT

There are modifications of the RSA algorithm, which are considered suitable for long messages, after analyzing known games and attacks.

How to Strengthen the Security of RSA-OAEP, Boldyreva et al.


Original Answer

Your question addresses the block cipher concept. In the kernel, you have an algorithm, which operates on byte blocks of defined size; according to a defined operation mode, data blocks are preprocessed or post-processed. Easiest mode is the atomic ECB, "Electronic Codebook" mode, the one without ´knowledge´, i.e. you split your input stream 'plain.txt' in packets of defined size, encrypt them sequentially and concatenate the output in an output stream, e.g. 'encrypted.bin'. The transition from text to octets (=bytes) is in your case already anticipated, as you read a binary file. In other operation modes as CBC, CTR input or output from former blocks is an additional input for the block encryption.

These modes cover the structural aspect of fragmentation and memory on base of a byte input stream, which you start with as 'plain.txt'.

Now to the processing of blocks: Usually a padding algorithm is applied to an input block, which adds salt and or mixes input bytes by hash functions, "OAEP" is a common candidate. Text book RSA omits padding.

Next step is transition from the padding output block to the numeric algorithm, the one you "put aside". This is defined PKCS#1 as mentioned, also the mapping back from number space to byte blocks.

Concerning dimensions: You start with a RSA key length and associated modulus, which limits size of equivalent data blocks. Padding usually decreases the maximum size of user data, again. The evaluation of this usable block size is an essential input for the fragmentation.

Finally, decryption needs to know the length of the original stream. This must be encoded, too. E.g. you prefix the input stream with it's length.

Note: It's politically incorrect to apply block cipher modes to RSA, and some people will rather suggest you to limit your text file size. Listen to them, if you think, they solve your problem. :) Anyway, a stream will attract the block cipher concept.

Maarten Bodewes avatar
in flag
You cannot apply block cipher modes other than ECB to RSA *with a secure padding* - at least not without alteration. And it would always lead to overhead otherwise. Obviously it is also much less efficient, especially during decryption. Lack of efficiency and message expansion are objective disadvantages rather than political ones if you ask me.
us flag
I think that most users are here to help people along and I am one of them.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.