Score:2

$2^{64}$ versions of the same message

cn flag

I am reading a textbook and in there they explain the property of hash functions. In particular, they give an example of how unlikely it would be to find a second input value that would match the hash output of the original input. Here's the example:

We show now how Oscar could turn his ability to find collisions (modifying two messages) into an attack. He starts with two messages, for instance: $x_1 = \text{ Transfer \\\$10 into Oscar’s account } \\ x_2 = \text{ Transfer \\\$10,000 into Oscar’s account} $

He now alters $x_1$ and $x_2$ at "nonvisible" locations, e,g., he replaces spaces by tabs, adds spaces, etc. The meaning of the messages is the same (e.g. for a bank) but the hash changes.

Oscar tries until the condition $h(x_1)=h(x_2)$. Note that if an attacker has e.g., $64$ locations that he can alter or not, this yields $2^{64}$ versions of the same message with $2^{64}$ different hash values.

Could somebody please explain what do they mean by $2^{64}$ versions of the same message? This completely flew over my head. I know that a hash function (for example the SHA-256) produces a 64 output, so that for example:

SHA256(Transfer $10 into Oscar’s account)=250e62ddffbdf20a0ea40d69287327e8aff58b6ad49c03dab3f714b596804dc1

I understand that Oscar wants to modify Transfer $10,000 into Oscar’s account so that the output, when plugged into the SHA-256 function, yields the same output as the above. But what do they mean that the attacker can "alter or not" the 64 locations, and how does this "altering or not" yield $2^{64}$ versions of the same message?

kelalaka avatar
in flag
`Empty spaces what are we living for`, apart from fun, isn't it clear? add 64 white space or tab this makes 64 possible positions to vary. Then you have data amount similar to the birthday attack that has 1/2 success probability.
Slim Shady avatar
cn flag
@kelalaka sorry, I'm quite new to the space so it's not clear for me.
kelalaka avatar
in flag
Press `space bar` on your keyboard, what do you see?
Slim Shady avatar
cn flag
@kelalaka change in output of the hash function. This is clear for me. I know that we can in theory arrange $x_2$ so that its hash output will be the same as $x_1$. This is all I know
kelalaka avatar
in flag
By the way, what is the name of the book?
Score:3
in flag

The quoted text seems to talk about finding a collision of a 128-bit hash function with the Birthday attack. In a birthday attack, one creates around $\sqrt{2^{128}} = 2^{64}$ messages so that they expect to find a colliding pair with 1/2 probability.

In the described attack, Oscar wants to create two specific messages that have the same hash value.

$x_1$= Transfer \$10 into Oscar’s account
$x_2$= Transfer \$10,000 into Oscar’s account

In order to create $2^{64}$ messages, one can use invisible characters like the space and tab. If you append 64 characters to $x_1$ or $x_2$ those are either tab or space then you can get 64 locations. This makes $2^{64}$ messages that have the same meaning with high probably different hashes.

This invisible modification applies both $x_1$ and $x_2$.

Creat $2^{64}$ different strings for $x_1$ and $x_2$ and combine them in a set. In this set, we expect a collision. Keep in mind that, in this way, we may have a collision within the variant of $x_1$ (or $x_2$).

Now, Oscar seeks a way to deceive you. Oscar sends you the message $x_1$ with hash and sign paradigm and you verify it. Later oscar claims that they sent you $x_2$. They show you that the signatures are the same as the previous and here we have the conflict to resolve.

For other examples of using hash collision in realistic attacks see this question;


Collision attack vs second pre-image attack

In the collision attack we are looking two messages $m_1$ and $m_2$ with $m_1 \neq m_2$ such $h(m_1) = h(m_2)$. In a collision attack the attacker has free of choosing the hash value, they only seek two messages that have the same hash value. This freeness reduces the attack cost. The generic cost of collision is $\mathcal{O}(\sqrt{2^{n/2}})$-time for $n$-bit output hash function.

In some other scenarios, the attacker needs second pre-image attack; given a message $m$ and it's hash value $x=h(m)$, find another message $m' \neq m$ such that $h(m)=h(m')$. This is the scenario where the attacker creates a forgery of a digital signature ( hash and sign). Given the signature, they try to find another message $m'$ such that the signature is the same as the given.

Two generic cost of secondary pre-image attack is $\mathcal{O}(\sqrt{2^n})$-time for $n$-bit hash function.

Formal definitions can be found in

jjj avatar
cn flag
jjj
I don't think they are talking about a birthday attack. One message seems to be fixed, otherwise one could modify both at 64 positions, making it 2^64 different versions each. Also the attack only makes sense, when x1 has been authorized somehow (for example, hash has been signed).
kelalaka avatar
in flag
@jjj the texts say (modifying two messages), and I made a simple argument to combine all of the variants of $x_1$ and $x_2$ into a set and look for collision as birthday bound. If one is fixed then it is a secondary pre-image attack. Didn't I say hash and sign?
jjj avatar
cn flag
jjj
Oh, missed that. I still think it is worth mentioning that for this example one normally would have a given hash and therefore one fixted message.
kelalaka avatar
in flag
@jjj better now?
Score:2
in flag

To: ["The",""] First International Bank ["Panama",""] Subject: ["Money",""] ["Wire","Transfer"] ["Order",""] from my account ["Num","#"]1234[".",""]

I ["hear by","would like to"] Instruct ["you ",""] to ["transfer","wire","move"] ["The sum","an amount"] of ["one million","1,000,000"] ["USD","$"] to ["The following",""] account ["num","#"] 3456.

etc. This above has "$2^{14}*3$" combinations for only a begining of a relevant message. Without playing with whitespaces.

kelalaka avatar
in flag
Nice, a language solution.
dave_thompson_085 avatar
cn flag
The word normally used there in English is not 'hear by' but 'hereby' (also not 'here by'). But these would (both!) be plausible mistakes you could use to generate equivalent-seeming texts.
Score:0
kz flag

If I take any message, I can append a space character or tab character to get two different messages. Each of those two, I can append another space or tab to get four possible messages. A third character gives 8 possible messages, then 16 and so on.

If I take "send 10,000" and append 64 characters, and each of them is either a space or tab character, then I get exactly 2^64 different messages. And with a 64 bit hash, there's a good chance that one of these messages has the same hashcode as the "send 10" message.

The reason to add space or tab, and not just any character, is that the reader will not see these characters. If I received "Send 10,000 xlfe13^" I would be suspicious.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.