Score:0

Is there a way to short circuit (speed up) hashing a large but sparse array?

cn flag

Imagine a large array (Megabytes) that is virtually empty, i.e. contains 0 in almost all locations. But also imagine there's 1000 pseudo random locations that contain a pseudo random byte. There is no correlation between bytes nor locations. So on average, there is a random byte > 1000 bytes apart. Then the entire array is hashed with a Merkle–Damgård type hash.

Is there a way to speed up the overall hashing time by knowing that the likely values ahead are 0? Certainly most of the time, the next input block will be all 0. Many times over. Can this statistical predictability be leveraged for speed?

fgrieu avatar
ng flag
Is "[Merkle–Damgård](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction)" a hard requirement? Because if instead we use an adequate hash tree, then serious savings are possible (including when keeping a compression function like that of SHA-256 or SHA-512).
Paul Uszak avatar
cn flag
@fgrieu I was considering SHA-256 actually...
Score:1
my flag

Can this statistical predictability be leveraged for speed?

Well, both SHA-256 and SHA-512 apply 'message scheduling' to expand the 64/128 [1] byte message block to a 256/640 byte sequence (or 64/80 word sequence); this message scheduling isn't that expensive, but it isn't free. If you were to check the next block and see all the bytes are 0, you could either use a fixed 256/640 byte sequence (what the block of all 0's would expand to), or use an alternative hash compression implementation which skips stirring in the words from the message scheduling (because I believe that, in both cases, the message scheduling expands an all-0 message block into an all-0's word sequence).

I suspect that testing the bytes, and doing a conditional jump based on that would, on average, be measurably faster (assuming that most of the message blocks do consist of 'all 0s')


[1]: By 64/128, I mean "64 bytes in the case of SHA-256 and 128 bytes in the case of SHA-512"

Paul Uszak avatar
cn flag
Yes, statistically most of the message blocks do consist of 'all 0s'.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.