Score:3

If I hash different seeds with same size of hash output and XOR on plaintext, is it secure as One-time pad?

il flag

Let's suppose I take 1MiB of truly random data and split in pieces (seeds) of 32-bytes (256-bits), so I hash each seed with a hash function with the same output digest size (32-bytes/256-bits) and XOR each piece of hashed seeds into a 1MiB plaintext.

The random data is kept secret.

My question is:

Will this scheme have the same security of One-time pad?

canary avatar
ch flag
I think you cannot prove it affirmatively even under the Random Oracle Model! Since you have no way to assume that the hash function is a true random permutation, I think there is no way to compare it with OTP's security.
Score:5
ng flag

Will this scheme have the same security of One-Time Pad?

Not in theory, facing arbitrary powerful adversaries. E.g. it's overwhelmingly probable that such adversaries could distinguish among two ciphertexts which one is for input that consists of zeroes, when the other is for input that consists of ones. They can't for the OTP. Such arbitrary powerful adversary uses that the distribution of the 256-bit hashes is not uniform, and in particular does not reach about 37% of the 256-bit outputs.

But yes in practice for a good unbroken hash like SHA-256 and an actual thus compute-bound adversary.

Score:2
cn flag

Not quite.

Hash functions by definition are surjective. That means their outputs collide given multiple differing inputs. That creates a bias. E.g. So with a 256 bit input, you'll lose 50% of your inputted entropy to achieve an acceptable $2^{-64}$ bias. That's in accordance with the Left Over hash lemma. That's vaguely also where the idea of entropy halving after hashing arises.


I assume that "1MiB of truly random data " means 8 bits of entropy per byte.

fgrieu avatar
ng flag
No, hashing 256 bits of random data is not expected to loose 50% (128 bits) of the initial 256 bits of (Shannon) entropy. It's expected to loose about 0.827 bit of entropy (about 0.32%). See [this](https://crypto.stackexchange.com/a/24672/555). It's very easy to numerically experiment with this on a small scale, e.g. with SHA-256 truncated to 16 bits on input and output. We can hash all 16-bit inputs, compute the exact probability of each 16-bit output, and the exact entropy. It's nowhere near 8 bit, more like 15.2 bit.
Paul Uszak avatar
cn flag
@fgrieu-onstrike You'r probably right, but you'll have to take this up with NIST as their conditioning functions distil 256 bits of entropy down to (their) acceptable 128...
Paul Uszak avatar
cn flag
Aren't we both saying the same thing - no.
fgrieu avatar
ng flag
We agree on "Not quite", and that with near certainty there remains _at least_ 50% of the initial entropy, and a bias _less_ than $2^{-64}$. The part I disagree with is "you'll lose 50% of your inputted entropy".
Tanner Swett avatar
cn flag
I think you want the phrase "not injective" instead of "surjective." "Surjective" means "each valid output is produced by at least one possible input"; "injective" means "each valid output is produced by no more than one possible input" (or, in other words, "no collisions").
Score:1
ph flag
jpa

If I hash different seeds with the same size of hash output and XOR on plaintext, is it secure as a One-time pad?

Ignoring the 256-bit blocks specified in the question body, the answer to the question in the title depends on the hash length.

Consider a block size of 1 byte. A hash function with 8-bit output will map each byte into a random other byte. But some of the input bytes will map into the same output - on average there will be 161 distinct output values for the 256 distinct input values. A one-time pad covering only 63% of byte values is clearly not as secure as a proper one-time pad.

At the other extreme, you can consider a hash input block size of twice the message length. At this point, it is almost certain that the hash output after truncation to message length will cover all values close to uniformly, and is equivalent to a random one-time pad.

Where exactly the threshold lies will depend on message contents and how much other information is available for cryptanalysis. Without knowledge of message structure, a one-time pad is not vulnerable to direct brute forcing. Even a very weak pad couldn't be broken if you don't have a way to verify that the decryption is correct. For highly structured and repetitive content, even 256-bit hashing could have enough of a bias to distinguish the encrypted data.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.