Score:0

Concatenation in Merkle Trees

in flag

Consider a simple Merkle Tree with leaves alice +100 and bob +50. Using the SHA256 hash algorithm, the digest of the respective strings are:

# alice +100
dc2cac4a8aaeccc0199eeb77df68b22eaa6e319d3f2b425d078dbd73419e28ac

# bob +50
7e15e5bc1b84f458db7ced4df762ba70204f19e3a613738756f9b00653f0aee1

Being a hash function, SHA-256 is deterministic, so it doesn't matter which programming language we're implementing this in. For completeness sake, this can be done through the crypto-js library:

const hash256 = (string) => {
  const val = 
  require('crypto').createHash('sha256').update(string).digest('hex');
  return val;
}

# apply
hash256('alice +100')
# results:
dc2cac4a8aaeccc0199eeb77df68b22eaa6e319d3f2b425d078dbd73419e28ac

When one refers to the Merkle Tree implementation, one see the following description:

A hash tree is a tree of hashes in which the leaves are hashes of data blocks in, for instance, a file or set of files. Nodes farther up in the tree are the hashes of their respective children. For example, in the above picture hash 0 is the result of hashing the concatenation of hash 0-0 and hash 0-1. That is, hash 0 = hash( hash(0-0) + hash(0-1) ) where + denotes concatenation.

enter image description here

I've seen higher level library abstractions that implement this concatenation in the case of Buffer.concat(), but I'd like to know how exactly would one implement this from a purely mathematical point of view.

One would assume (incorrectly):

alice_hash = hash256('alice +100')
bob_hash = hash256('bob +50')
# wrong
hash256(alice_hash + bob_hash)
# wrong also: adding the 0x-prefix
hash256(
  '0xdc2cac4a8aaeccc0199eeb77df68b22eaa6e319d3f2b425d078dbd73419e28ac'
  + 
  '0x7e15e5bc1b84f458db7ced4df762ba70204f19e3a613738756f9b00653f0aee1' 
)

So without any abstractions, how would one concatenate the two hashes to obtain the resulting parent node?

For anyone trying to help, the correct value of hash(hash(alice) + hash(bob)) should be edf9a9a0e56b58fc9caccb97d85c628d5b9dc50cb94dfc41e83026d37704400f. I tried adding / removing the 0x prefix, adding one space character between them, and none of these attempts were fruitful. I also read papers I can get my hands on, but never got any farther than "concatenate them to obtain the value for the parent node" with very little implementation reference.

kelalaka avatar
in flag
Both the answer and you and your source forgot to use the Merkle Tree properly 0s 1s. [See in this answer](https://crypto.stackexchange.com/a/71313/18298)
Morrolan avatar
ng flag
To elaborate on what @kelalaka meant to convey - proper implementations of Merkle trees will usually add an identifier of *where in the tree* the hashed element was to the data being hashed. This helps prevent certain types of attack. There's various ways to do this. One is seen in the linked answer, another in [RFC 6962](https://www.rfc-editor.org/rfc/rfc6962.html#section-2.1) where they simply differentiate between leaf and non-leaf nodes.
onlyphantom avatar
in flag
I understand, and truly appreciate the feedback & links here. I left out the other details to focus on the "concatenation" part. But after your answer @Morrolan I was able to figure it out, and implement it correctly in python. The takeaway was to see the hexadecimal string as just a representation. Perform the + on the binary string instead. Thanks for the answer!
Score:2
so flag

Just FYI, the result edf9a9a0e56b58fc9caccb97d85c628d5b9dc50cb94dfc41e83026d37704400f is calculated with big-endian rule. In bitcoin's case, you will need to reverse bytes array before and after sha256 hash. So, the result should be edf9a9a0e56b58fc9caccb97d85c628d5b9dc50cb94dfc41e83026d37704400f

onlyphantom avatar
in flag
this is a helpful remark, thank you!
Score:1
ng flag

The issue at hand stems from mixing up the hexadecimal string representation ('hexdigest') of a hash value with its actual binary value.

Recall first that a hash function takes an arbitrary-length (for practical purposes) bit string as input, and outputs a fixed-length bit string. That is, it operates on binary values.

As such, the actual hash value of the (ASCII-encoded) input string alice +100 is a sequence of 32 bytes. dc2cac4a8aaeccc0199eeb77df68b22eaa6e319d3f2b425d078dbd73419e28ac is that byte sequence represented as the 'hexdigest', where each two hexadecimal characters encode one byte.

For the concatenation operation you have to concatenate the byte sequences, rather than their hexdigests.

One example in Ruby:

require 'digest'

m1 = 'alice +100'
m2 = 'bob +50'

d1 = Digest::SHA256.new
d2 = Digest::SHA256.new

d1 << m1
d2 << m2

puts "Alice hexdigest: #{ d1.hexdigest }"
puts "Bob hexdigest: #{ d2.hexdigest }"

d3 = Digest::SHA256.new
d3 << d1.digest
d3 << d2.digest                                                                                                                                                

puts "Concatenation hexdigest: #{ d3.hexdigest }"

Would yield, when executed:

Alice hexdigest: dc2cac4a8aaeccc0199eeb77df68b22eaa6e319d3f2b425d078dbd73419e28ac
Bob hexdigest: 7e15e5bc1b84f458db7ced4df762ba70204f19e3a613738756f9b00653f0aee1
Concatenation hexdigest: edf9a9a0e56b58fc9caccb97d85c628d5b9dc50cb94dfc41e83026d37704400f

In your example, when you calculate the hash of the concatenation of the two hexdigest strings, your programming language concatenates these two strings, encodes them into a binary sequence (using whatever character encoding it defaults to), and hashes that input sequence.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.