Score:12

Hash paradox in an image file that contain hash text?

cn flag

Is it possible to include a hash digest visibly in an image, such that the hash of the image itself is that same digest?

When we draw the text of the hash in the image, we will of course change the hash of the image at the same time, because as we know, small changes to the input of a hash function produce significant changes in the output.

I am also aware that hash functions are irreversible.

I thought about this while learning image processing. I want to put text with the value of the image hash itself near the top of the image.

Is there a solution to achieve my purpose somehow?

enter image description here

xk flag
From your comments on the existing answer: "actually my purpose is not for security reason, but to convince people who guess what this image is talking about. If I tell the answer `broken heart` after they surrender, ofcourse they dont believe me, so that's why I put hash inside the image." If the goal is simply to have a commitment (to the text `broken heart`), why isn't including a hash of the thing you committed to in the picture enough? Why does *the image itself* also have to hash to the same thing as what you committed to?
cn flag
All modern hashing algorithms are intentionally designed to mitigate this possibility. What you're asking for is to find a hashing algorithm that does not have [preimage attack](https://en.wikipedia.org/wiki/Preimage_attack) resistance. This is why, for example, a PGP-signed message includes all of the text in the email *except* for the signature. It would be impossible to calculate the signature that includes itself in the text, at least as far as we're aware.
Joseph Sible-Reinstate Monica avatar
vn flag
https://news.ycombinator.com/item?id=13823704
Score:13
in flag

is there a solution how to achieve my purpose?

We can re-formalize your question in a general form as;

Find a text that contains its hash inside of it. $$\text{digest-value} = \operatorname{Hash}(\text{some part| digest-value | some other part)})$$

When you change the text, the hash value will be changed. This is not possible with cryptographic hash functions since the search is beyond all.

What you can do;

  • if the hash text is really 16 hex values, then you try all possible $2^{64}$ values to match. We may assume that you use SHA-256 and trim the result. The drawback is this; we don't know that trimmed SHA-256 attains all values.

    If one hashes $2^k$ values and trims to $k-bit$ then due to the collision expectation by the birthday paradox, we don't expect all values to occur.

    So this may not have a solution to all images. This still requires a lot of work.

    If the hash text is the real size then the above is not possible.

  • Use partial of the image, this was similar to paper PUF. They extracted the surface of the paper with some fuzzy extractor, then the paper is signed on the part where extraction is not performed.

    Use Part of the image to calculate the hash value then print the hash value on the part that hash is not calculated.

  • Use layered images. One layer contains the image that the whole part can be hashed and the upper layer can contain the has value.

    It might be hard for a user to process the layers, instead, you can use programming tricks like in the web pages, one can add text over an image with HTML and CSS.

My purpose was to make some people easier that they don't need to hash it manually.

Why don't they need to hash manually then what is the use of the hash? Can't an attacker just send an image with the hash value of their file, so the user just believes it? This is not a good security practice at all, let the users calculate the hash even check the digital signature that it is signed by you.

Maarten Bodewes avatar
in flag
Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/133668/discussion-on-answer-by-kelalaka-hash-paradox-in-an-image-file-that-contain-hash).
Score:9
in flag

There have been at least two successful attempts to create GIF images that display their own MD5 hashes:

hashquine by spq

Hashquine by spq

hashquine by Copyheart Rogdham

Hashquine by Copyheart Rogdham

You can download both files and verify that the md5sum hashes equal the hashes displayed in the images.

These rely on the fact that MD5 collisions are easy to produce nowadays, and the fact that the GIF format is a sequence of frames. Effectively, the GIFs consist of 32 chunks of animation data. Each chunk is calculated as a 16-way MD5 multicollision, i.e. there are 16 different chunks which produce the same hash but show different hexadecimal digits. So, the GIF is built by computing and concatenating all 32 16-way collisions, computing the resulting hash of the file, and then selecting the chunks which produce the desired output. In other words, the flexibility of the GIF format, and the weakness of MD5, allows the displayed hash to be chosen one digit at a time without affecting the hash of the image file.

In principle, a similar result could be achieved for any other hash function, so long as it is easy to produce collisions. For example, it would be easy to do this with the CRC family of hashes, since it is easy to collide those (just solve a linear equation). However, for hash functions that are currently collision-resistant, such as SHA-256, it is computationally infeasible.

Other file formats can be attacked in this way too: for example, issue 14 of the PoC||GTFO journal shows its own MD5 hash on the cover page of the PDF: https://www.alchemistowl.org/pocorgtfo/pocorgtfo14.pdf. PostScript, and even the NES ROM format can similarly be attacked thanks to file format tricks; read the journal for more technical details.

kelalaka avatar
in flag
Yes, this requires GIF that is not applicable to standard image files.
in flag
I'm pretty sure that the GIF format is a "standard image file"; it's widely accepted in browsers and image tools, and is very common on the Internet. Yes, this isn't generalizable to all formats - BMP for example would very likely not work - but I would not be surprised if you could make similar techniques work for more than just GIF. (There are already hashquines for PDF, PS, etc.)
kelalaka avatar
in flag
I should have said non-animated that is the key to the attack. I thought, however, I couldn't find a way to apply this attack. We should forget about MD5 and SHA-1. And, of course, welcome to [cryptography.se]
in flag
PDF and PS are non-animated, but the attack is still applicable. I would not be surprised if there's a clever way to apply this to PNG or JPEG with some very crafty abuse of the format.
kelalaka avatar
in flag
The only drawback, actually, OP was requesting a specific hash, not a random hash produced with collisions. You can see this on the comments moved into the chat under my answer, or on the last part of my answer quoted part.
kelalaka avatar
in flag
Whenever the data format enables freeness, this attack will work.
ph flag
jpa
This is probably possible in any bitmap format this way: generate two chunks of 32x RGB32 pixels, one for foreground and one for background color. Randomize the least sensitive bits of each pixel until you find a MD5 collision between foreground and background (specific algorithms exist to speed this up). Assemble chunks back-to-back to get the image you want. Results in blocky low-resolution text.
in flag
@jpa: That's a good idea. I think if you have a *lot* of patience you could use a chosen-prefix collision to achieve this in a slightly different way. The MD5 chosen collision attack lets you take two arbitrary, different prefixes and computes two corresponding blocks of random data (in approximately 1 day) that can be appended to make the MD5 hashes equal. This would correspond to having a row or two of garbage between chunks, unfortunately, but there might be ways to hide/reduce the visual disturbance, e.g. with palette tricks.
Score:4
cn flag

There was a time when this was possible. Internet Explorer (horrors) accepted WMF files as image files. Thanks to something that was by design in the file format, this was never safe. The image could run arbitrary code as part of image rendering. Thus, the solution is for the file to compute its own hash at render time and layer it on top of the rest of the image.

https://en.wikipedia.org/wiki/Windows_Metafile_vulnerability

I know this isn't the answer you're looking for, but that's what it is. A true answer for a crypto hash is going to be about this difficult. The crypto hash algorithms are designed so that you can't do this, and that's the point. If you could do this with static data the hashes couldn't do what they were designed to do.

On consideration, enough Quine-like work and you might well be able to get a postscript or a PDF file that does this. It would be rather disturbing to the security community until they figured out the trick.

Score:3
cn flag

This is one of those cases where the math answer and CS answer are rather different. Given a cryptographically secure hash:

From a mathematical perspective, given a file f and a fixed way of combining files and hashes, we should expect a 1/e chance that there is at least one hash h such that the hash of h combined with f is h. If we vary the ways of combining (put in the absolute top left corner, put it in the top right corner, put it in the top left corner but moved one pixel to the right, etc.), the probability quickly comes close to 1.

From a CS point of view, the expected amount of calculation is proportional to the number of possible hash output. For any decently sized hashes, that is an unfeasibly large amount of calculation. The amount of calculation needed would be comparable to the amount needed to get a file with a hash equal to a pre-selected hash. If you were able to do this, people would immediately start worrying whether you've broken the hash somehow.

ru flag
A good way to kill a hash function you don't like - produce an image file with hash part of the image.
Score:1
us flag

Use the filename

Since the hash in the image is actually intendeded as a tip, you can simply name the file 3e2c5b56e34f1979.jpg

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.