Using hash of data as proof of integrity and preventing collision

Jaime

11/26/22, 7:53 AM

Rather than storing user data when interacting with an app, I am storing the SHA3-256 of the data. This is because data storage in this particular environment is very limited.

The data can be several variables, e.g., a, b, and, c, but instead of saving them individually, I save the hash of the concatenation: SHA3(a,b,c).

When the user wants to interact with the system, they should send the variables again and the system will compare the hash of the summited variables with the stored hash. If they match, the system assumes that the variables are the same as summited initially. So, I end up saving only one variable instead of 3 in this example.

The data is public, so it is known to everyone. The question I have is if there is a security issue by hashing the concatenation of the variables (does this make it easy for the users to find a collision). Is there any difference if instead I hash every variable and then hash the concatenation of the hashes?

To elaborate the above, the reason I think this could make it easy for users to find collisions is that a, b and c are of different types and have min and max values; for instance, if c can only have values between 10 and 1000, the users can just try all the possible values of C to test for a lucky collision without much effort. Is this a reasonable concern?

Note that is not important who submits the data, just that the summited data is exactly the same as summited previously.

1 + 3

hash

collision-resistance

sha-3

SEJPM

11/26/22, 1:54 PM

The classic attack on this is submitting a as one 0 byte and b as two zero bytes and the later swapping the association of these data pieces. This will result in identical hashes for the concatenation case but different ones for the individually pre-hashed case.

Jaime

11/27/22, 8:35 AM

Thanks for the comment. the parameters are types with predetermined lengths, for instance "a" couls be 32Bytes, "b" 64 bytes and "c" 128 bytes. In this case the attack is not applicable, right?. In general, if the length of the variable is fixed, lets say to 32 bytes, the attack that you describe can not be done right?

SEJPM

11/27/22, 1:45 PM

Yes, if you can _guarantee_ the lengths of at least all but one input to always be fixed then you usually get the full security properties of integrity type constructions.

Score:1

Crypto

Maarten Bodewes

11/27/22, 9:46 PM

As long as the hash stored securely is over a canonical representation of the data (i.e. the variables and their encoding cannot overlap) then storing the hash is secure. It would be similar security as having a signature over the data, but those are larger. One thing that I would make sure of is that you store the hash using a transaction; otherwise the user may interrupt and have you write none or only part of the hash.

If you want to make it even harder to attack or want to use less storage then you may consider storing a keyed hash instead, for SHA-3 that would be KMAC. In that case you could e.g. use 128 bits of the output instead of all 256 bits, while maintaining 128 bits of security, but with the disadvantage that if you leak the key then collision attacks on the output would only leave you 64 bits of security.

0 + 0

Score:0

Crypto

MtOne

1/23/23, 8:01 PM

I think you lack of understanding how hash functions work. I advise some intro books as a start. The avalanche strict criterio makes sure that even if one bit is changes at any representation it is practically infeasible to find collissions.

Your setup is definitely non secure against replay attacks. You shall follow some randomisation or salt techniques

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Using hash of data as proof of integrity and preventing collision

TH: ใช้แฮชของข้อมูลเพื่อพิสูจน์ความสมบูรณ์และป้องกันการชนกัน

RO: Utilizarea hash de date ca dovadă a integrității și prevenirea coliziunii

RU: Использование хэша данных в качестве доказательства целостности и предотвращения коллизий

VI: Sử dụng hàm băm của dữ liệu làm bằng chứng về tính toàn vẹn và ngăn ngừa xung đột

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.