Score:2

Long Random Key and XOR - How Secure?

pe flag

I have an application that encrypts files in the following manner (I think I can hear sighs already but bear with me):

  • Start with two byte arrays generated from random strings of lengths l1 and l2 (fwiw, l1 and l2 are primes)
  • Loop through both arrays (nested loop) and generate a third array of length l1 * l2 where each byte is the result of XORing the indexed bytes of the other arrays
  • Accept a password from the user and then successively XOR every byte in the third array, looping through the bytes of the password

The aim by this point is to have a key which is based on the password and longer than the data being encrypted which can then be used to quickly encrypt and decrypt the files.

I could have just used the password itself to XOR the file but I wanted to avoid the appearance of repeating sequences that might occur due to a password that is shorter than the data.

Now I'm not a mathematician or a cryptographer and I know that inventing my own security like this is a fool's game - but it's just for a pet project and is not being used for anything critical.

So the question is, how secure does this all sound? I suppose it'd be easy to get the generated key if you passed a large file of zeros and were able to examine the output but lets assume a hypothetical attacker only had access to a set of encrypted files (let's say JPEGs).

Apologies if this sort of thing has been asked before (I did try looking) and if I sound a bit naive!

Thanks

Morrolan avatar
ng flag
"lets assume a hypothetical attacker only had access to a set of encrypted files": Note that [ciphertext-only attacks](https://en.wikipedia.org/wiki/Ciphertext-only_attack) are one of the weakest forms of attacks. Typically when judging encryption systems we assume the attacker to be able to get [arbitrary encryptions](https://en.wikipedia.org/wiki/Chosen-plaintext_attack) if not even [arbitrary decryptions](https://en.wikipedia.org/wiki/Chosen-ciphertext_attack). Another thing you might want to clarify: How do you store $l_1$ and $l_2$? Or are they fixed parameters of your system?
mrrrk avatar
pe flag
Yeah, fair point. But in order to arbitrarily execute an operation, the attacker would hopefully need to have supplied the password in order to obtain the full key - in which case, the security hinges on the strength of the password. I suppose my question then is, are there any other glaring holes? :-) Mind, at the risk of answering my own question - if the attacker could see both the unencrypted and encryped files at the same time, the key could be found... Bugger. (as it happens, the app deletes the original after encryption so all is not necessarily lost)
Score:0
cn flag

Welcome to the site but yes, here it is: "Sigh" :-)

Start with two byte arrays generated from random strings of lengths l1 and l2 (fwiw, l1 and l2 are primes)

Primes aren't that random. There's a list of little ones here, and some bigger ones here. Other's are tricky to make - see primality test and think power consumption and generation/test time. Where do yours come from?

Loop through both arrays (nested loop) and generate a third array of length l1 * l2 where each byte is the result of XORing the indexed bytes of the other arrays

To encrypt a plain text, array three has to be longer (in your case). That's not so easy to manage given you're making them yourself(?)

Accept a password from the user and then successively XOR every byte in the third array, looping through the bytes of the password

See Maarten's answer re. password repetition and XOR effects.

But as a general comment, why & how is this a one time pad? Are you clear on how those operate?. The need for a physical TRNG? And don't forget cipher text malleability and authentication. Sorry: Sigh again...

Maarten Bodewes avatar
in flag
The lengths are random so that the lengths l3 = l1 * l2. The primes themselves are not indicated to be "random". If they are static then of course the third array is also static and the scheme fails. This is also not so logical as l3 should be larger than the message (/image) size. So I understand your confusion.
mrrrk avatar
pe flag
I didn't really think the prime thing was going to be particularly useful. It's just that I could make the lengths prime, so I did. But this is all moot since this is all a terrible idea! Probably good enough for the application in which it's being used - and for a learning experience - but nothing else. Don't do it kids.
mrrrk avatar
pe flag
_I_ didn't say it was a one-time pad! If it's the same 'key' each time it's by definition _not_ one-time :-)
Paul Uszak avatar
cn flag
@mrrrk Err, you've used the OTP tag at the bottom of your answer though...
mrrrk avatar
pe flag
Nope - @Maarten apparently edited my question and added that! - I've now removed it.
Maarten Bodewes avatar
in flag
Yes, as the encryption did very much look like the one-time pad, if the third array would be random & secret then I would suggest it is a one time pad (but it isn't). It uses the same encryption techniques as the one time pad and it broken because of the vulnerabilities around (wrong) usage of the OTP. For me, the fact that a question has a certain tag is not a claim by the author by the way. Tags are used so that people interested in the subject of the tag can find the questions & answers.
Score:0
in flag

Start with two byte arrays generated from random strings of lengths l1 and l2 (fwiw, l1 and l2 are primes)

What do you mean with "byte arrays generated from random strings? Byte arrays are octet strings. Are the byte arrays themselves fully random if the strings are text based strings?

The aim by this point is to have a key which is based on the password and longer than the data being encrypted which can then be used to quickly encrypt and decrypt the files.

That would require that l1 * l2 is at least as long as the data, but I don't see any description on how you've made sure of this.

So the question is, how secure does this all sound?

It sounds like a badly constructed one-time-pad scheme. The octet strings are still repeated and XOR'ed together, and I don't this this is secure. If somebody guesses the length of the input they might be able to reverse the operation. But that's not the main issue, so it is kind of irrelevant.

I suppose it'd be easy to get the generated key if you passed a large file of zeros and were able to examine the output but lets assume a hypothetical attacker only had access to a set of encrypted files (let's say JPEGs).

The problem here is what you're going to do with the two random strings, as they are required for decryption. Are you going to store these with each file? If you do and an adversary finds the random strings, the original file and the encrypted file then then can directly compute the password. The only thing that separates the sequence of passwords is the XOR of the original image with the third byte array after all.

Even if the original file and password aren't known then the adversary can still calculate the third array before the password has been applied. If the adversary then XOR's it with the encrypted file then they would retrieve the original file encrypted with the repeated password. That means you've now accomplished the many-time pad that you were trying to avoid.

If you are able to store the random strings securely then you might as well have used a symmetric key and performed symmetric encryption, e.g. AES-CTR or AES-GCM.


If you need password based encryption then I would at least use a standard that describes it: PKCS #5: Password-Based Cryptography Specification Version 2.1. This uses a Password Based Key Derivation Function called PBKDF2 to securely derive a secret, which you can use to encrypt your files. Note that image files are just binary files like any other files; you don't need an image specific algorithm.

mrrrk avatar
pe flag
Yeah, I sort of anticipated having my arse handed to me here! The random data came from random.org - but here's the flaw, they're stored in the app source code so if the attacker has that (which they probably will if they have the data) so are useless. They were supposed to mask the repeating of the password. To be clear, I would NEVER use this 'at work'. We use PBKDF2 to hash any database passwords and AES (Rijndael 128 bit) for any symmetrical encryption.
Score:-1
sv flag

In good cryptosystems, encrypted data looks a lot like random data.

Xoring random data with cleartext does not result in "ciphertext" that looks random enough.

If you encrypt an image with your proposed algorithm, the ciphertext would leak information that could be exploited. Like patterns, shapes.

Maarten Bodewes avatar
in flag
"Xoring random data with cleartext does not result in "ciphertext" that looks random enough." Sorry, but this is akin to saying that OTP schemes are not secure. "the ciphertext would leak information that could be exploited. Like patterns, shapes." That's just a claim at this point; you'd need to argument why this is the case.
Jeremy avatar
sv flag
You're right, I did not explain my hypothesis: I think that there is a statistical relationship between the cleartext and the ciphertext with this proposed method. It lacks the confusion and diffusion properties which would negate some cryptanalysis based on statistics. For example, block ciphers heavily rely on this. Even ChaCha20, a stream cipher, adds diffusion. I believe the theory behind this is from Claude Shannon.
Maarten Bodewes avatar
in flag
You're certainly right about that, but that's kind of negated if an OTP is being used. It isn't so it may have some relevance. If you want to add information to your answer then please adjust it by hitting [edit].
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.