How do Pre-image, Second pre-image and collision resistance actually work? How does this affect data integrity?

Thanos

11/10/23, 12:08 PM

I'm working on this past exam paper and found this question about pre-image resistance and its relation to data integrity:

Displaying the hash of a file on a website in order to provide data integrity relies only on the preimage-resistance property of the underlying hash function. Is this true or false?

I answered false due to the fact that:

Pre-image only makes it infeasible to find an input that hashes to that output. It's a one way function.
This does not protect it from collision resistance as its property essentially makes it infeasible to find any two inputs that hash to the same output.

Yet when I researched more on the matter, some sources would mention other stuff questioning such as:

False because why are other cryptographic algorithms with a key not used?
True since pre-image resistance is enough as it's a one way function; this therefore makes it impossible to crack. All the system does is compare if the hashes are the same to prove integrity.

And now I'm confused because I'm finding conflicting information that prove the statement to either be true or false. I feel like I am interpreting this question wrong and would appreciate some explanation on what the actual response for this question would be.

214

1 + 0

preimage-resistance

collision-resistance

Score:3

Crypto

fgrieu

11/10/23, 4:06 PM

Displaying the hash of a file on a website in order to provide data integrity relies only on the preimage-resistance property of the underlying hash function.

Is this true or false?

False, for several reasons:

The practice of "displaying the hash of a file on a website in order to provide data integrity" for that file relies, among other things, on the assumption of the integrity of the hash displayed. That assumption is true or false, and unrelated to "preimage-resistance property of the underlying hash function".
If (on top of the assumption in 1) we assume that the file was prepared randomly and is known to the attacker, then the safety of said practice coincides with (a non-quantitative definition of) second-preimage-resistance. But the usual meaning of "preimage-resistance" is first-preimage-resistance, which makes a different assumption†, not met in the use case at hand. And first-preimage-resistance does not imply second-preimage-resistance. Thus even under hypothesis such that second-preimage-resistance is the relevant property, (first)-preimage resistance of the hash is not sufficient.
Further, the assumption (of 2) that the file was prepared randomly is practically unwarranted. The safer thing to assume is that the file may have been intentionally prepared to allow undetected substitution without changing the hash. Under that sometime realistic assumption, the safety of said practice coincides with (a non-quantitative definition of) collision-resistance. And (whatever) preimage-resistance does not imply collision-resistance (in non-quantitative definitions, and in quantitative definitions for the same fixed security level).

SHA-1 is a practical example of (first and second) preimage-resistant hash that's unsafe for the practice considered, under the assumption in 3. It's even unsafe under the assumption, in-between those of 2 and 3, of a file bound to be in a prescribed and common format (e.g. PDF) and prepared by a non-malicious actor (e.g. the one that also non-maliciously and correctly computes the hash), assuming the preparation is with a maliciously crafted computer tool (e.g unknowingly to the said non-malicious actor). See the 2017 shattered attack for illustration.

For simple (non-quantitative) definitions of (first-)preimage-resistance, second-preimage-resistance, and collision-resistance, refer to this.

The following would-be answer reaches the correct conclusion, but uses the incorrect argument of proposing an unrelated method to reach the goal considered:

False because why are other cryptographic algorithms with a key not used?

For several reasons, this other would-be answer is wrong:

True since pre-image resistance is enough as it's a one way function; this therefore makes it impossible to crack. All the system does is compare if the hashes are the same to prove integrity.

"impossible to crack" refers to (first) pre-image and One-Way function (which are synonymous, for non-quantitative definitions at least). As developed in 2, that's not sufficient in the use case, even with strong assumptions.
The definition of (first) pre-image resistance and One-Way function is not in term of equality of two hashes, as in the argument in the second part of this argument.

As an aside, answering with "pre-image" when the problem statement uses "preimage" is bad from the standpoint of maximizing odds of succeeding to exams. Towards this goal, it's best to use the problem statement as the reference, unless it's indisputably wrong; in which case clearly pointing why can sometime be a reasonable course of action. On the other hand, it's often better (especially in MCQ) to correct the problem statement. From this standpoint, my argument 1, though formally correct, is perhaps best omitted, for my guess is that the intended question was:

It's displayed the hash of a file on a website in order to provide data integrity. Does the hash function require only the preimage-resistance property?

† Depending on authors, the (non-quantitative) definition of first-preimage-resistance assumes that the target hash is random, or that it is the hash of an unknown random secret. Whatever definition, that assumption does not match the use case.

+ 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: How do Pre-image, Second pre-image and collision resistance actually work? How does this affect data integrity?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.