Score:0

When inputting a stream of raw binary values to a SHA256 hash and a value of 10000000 is encountered in the input stream, does the hash end?

ng flag

I am acquiring binary data from a sensor and storing it in a file. As each byte of data is read from the sensor, it goes into a SHA256 hash. The length of the acquired data stream varies from one session to another. It is possible that an individual sample having a binary value of 10000000 could be encountered in the input stream prior to the end of the session's stream. How does the SHA256 hash algorithm treat that 10000000 value since that is also the marker used at the end of the hash input?

Andrew Barrett avatar
gw flag
Thanks to those who replied to my query. I have just now encountered the responses. In the interim, I was able to suss out usage of the SHA-256 algorithm and answered my question.
Maarten Bodewes avatar
in flag
Don't forget to mark an answer correct if it answers your question. I've added another one that takes a different tack that I hope is easier to understand.
Score:1
ng flag

How does the SHA-256 hash algorithm treat that 10000000 value since that is also the marker used at the end of the hash input?

SHA-256 does not special-case any pattern in it's input. And when using SHA-256 to hash non-secret data there's no need to bother about this internal detail. I'll add one reason to these: since hashing is not encryption there is no decryption process for a hash, thus no need to recognize some end pattern, as occurs in decryption of ciphertext for variable-size plaintext in e.g. CBC mode.

There are reasons to be bothered about the end pattern when dealing with secret data: the SHA-256 length extension property. It holds for any Merkle-Damgård hash $H$, that for any length $\ell$ (up to so huge limit), there's a short bitstring $b_\ell$ (including the end pattern) and an efficiently computable function $H_\ell$ such that for any bitstring $m$ of $\ell$ bit, for any bitstring $m'$ (up to so huge size), it holds $$H(m\mathbin\|b_\ell\mathbin\|m')=H_\ell(H(m)\mathbin\|m')$$

For any secret $m$ with known length $\ell$ and known hash $H(m)$, this property allows to compute $H(m\mathbin\|m')$ for any known $m'$ starting in $b_\ell$. It's also enough to justify that $m\mapsto H(k\mathbin\|m)$ is not a secure Message Authentication Code with key $k$.

Score:1
fr flag

A cryptographically secure hash function can handle any arbitrary sequence of bytes, regardless of what those bytes are or what pattern they may have, possibly up until a maximum size specified (which, for SHA-256, is $ 2^{64} $ bits).

It is true that SHA-256 uses a single one bit followed by zero bits as part of its padding (the Merkle-Dåmgard scheme). However, that pattern may occur in the input stream without a problem, and because the last block contains the input length, we can distinguish between that pattern in the input and that pattern as part of the padding scheme. So there isn't any specific in-band pattern that will cause the hash to end abruptly.

Note that other hash algorithms, such as SHA-3 and BLAKE2, use different padding schemes, and they also can handle arbitrary input patterns without a problem. These padding schemes, while different from the Merkle-Dåmgard construction, are also believed to be secure, and may actually be preferable for other reasons.

Typically, when we write an API to hash bytes, we end up with three functions: an initialization function, which sets up the algorithm with the proper parameters; an update function, which takes input to hash; and a finalization function, which performs padding, finishes the hashing, and returns the hash result. As such, we always explicitly indicate that the hashing is to end without regard to the input data.

Score:0
in flag

TL;DR The padding is always applied. The final part of the message will not even be inspected; it will just be put in the hash function as-is, making the input blocks unique if the message is unique.


SHA-256 processes the padded input message block-by-block. Usually this fact is however hidden by higher level API's, which simply buffer bytes until the block size is reached, and then executes the inner block-hash function, updating the internal state.

Just like e.g. PKCS#7 padding for block ciphers in CBC mode, the padding of SHA-256 is always applied, regardless of the input. That means that, when needed, the hash function may have to process an additional block consisting just of padding. It will always perform bit padding even if the message already ends with a "correct" padding value. This way the input represented by the 512-bit input blocks - including the padding - will always be unique as long as the input to the hash function is unique.

In addition, the bit padding scheme - a one bit followed by as many bits to complete the block - is not the only padding that is applied to SHA-1 and SHA-2 hash functions such as SHA-256. The padding also includes the length of the input message in bits. The why for that is to make sure that you cannot create two subsequent messages that hash to the same value.

Maarten Bodewes avatar
in flag
Since the other answers are a bit complex, I've added this easier one.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.