Score:1

Is it possible to generate a key from a hash function, being the key larger than the hash function digest output without spliting the seed?

pf flag

Let's suppose I have a 2MB photo and I want to generate a key from it, and the key should have 2048-bits of size.

A hash function such as SHA3-512 would not deliver a key with with the desired strength.

Is it possible to generate such a key with a hash function having the hash function a maximum digest output/internal state smaller that it, without having to split the file in many parts?

I have been thinking on hashing the file using chaining, as described here: https://crypto.stackexchange.com/a/99579/89581

But I don't know if it would serve for this purpose.

Score:3
ng flag

If $F$ is the file, and $H$ the 512-bit hash, three good options to build the 2048-bit key are

  1. $key = H(\mathtt{00}\mathbin\|F)\mathbin\|H(\mathtt{01}\mathbin\|F)\mathbin\|H(\mathtt{02}\mathbin\|F)\mathbin\|H(\mathtt{03}\mathbin\|F)$

  2. $key = H(F\mathbin\|\mathtt{00})\mathbin\|H(F\mathbin\|\mathtt{01})\mathbin\|H(F\mathbin\|\mathtt{02})\mathbin\|H(F\mathbin\|\mathtt{03})$

  3. $key = H(H(F)\mathbin\|\mathtt{00})\mathbin\|H(H(F)\mathbin\|\mathtt{01})\mathbin\|H(H(F)\mathbin\|\mathtt{02})\mathbin\|H(H(F)\mathbin\|\mathtt{03})$

    where $\mathtt{00}$, $\mathtt{01}$, $\mathtt{02}$, $\mathtt{03}$ are one-byte bytestrings.

In all the options, the whole entropy of $F$ goes into each of the 4 output segments, and one can't find a portion of the output from the other outputs with partial knowledge of $F$ (or of it's hash). In the first option, entering the 4 bytes as prefixes maximizes the diffusion. The second and third options allow faster implementation for large $F$, and are safe enough.

The practitioner will use the SHA-3 Extendable-Output Function $\operatorname{SHAKE256}(F,2048)$, which is a pre-engineered method to do the job.


As noted in comment: If we are low on entropy in $F$, then we want a purposely slow entropy/key-streching hash function, as used to turn passwords into key. That could be Argon2. Like SHAKE256 it has a tagLength parameters that controls the output size (but it's in bytes, thus we want to set it to 256 for 2048-bit output).

For methods 2, 3, SHAKE256 and Argon2, the actual entropy in the output has a limit sizably lower than 2048 bit (more like 512 bit), even if there is ample entropy in $F$. This is not a practical security issue.

kelalaka avatar
in flag
Maybe Argon2 is a better candidate due to Key stretching though it is [limited to 512](https://crypto.stackexchange.com/a/98514/18298), too
phantomcraft avatar
pf flag
@kelalaka This is the exact reason why I wouldn't use Argon2 to produce a key larger than its internal state.
phantomcraft avatar
pf flag
@fgrieu I got, but if it's only adding 01, 02, 03 and so on, chaining would work well, am I right? crypto.stackexchange.com/a/99579/89581
fgrieu avatar
ng flag
@phantomcraft : yes. We can replace 00 with a fixed string (including empty), and the other 01 02 03 with the previous 512-bit output segment. The risk of entering a short cycle remains small.
phantomcraft avatar
pf flag
@fgrieu Can 00, 02, 03 be bigger values such as 777, 999, 1000?
fgrieu avatar
ng flag
@phantomcraft: 00, 01, 02, 03 can be replaced by any distinct byte strings. 777 will do if it's the bytestring with 3 bytes 37 37 37.
Score:1
cn flag

The concept you're looking for is a key derivation function. A key derivation function takes one or more inputs and produces a stream of bytes such that:

  • The output is pseudorandom: given part of the output, it's infeasible to guess the rest, unless you manage to find the inputs.
  • Given the output, it's infeasible to guess the input except by trying candidates and comparing the given output with output from the candidates.
  • Outputs for different inputs are independent: knowing the output for some inputs doesn't help to guess the output for a different input.

Some key derivation functions have a limited output size, while others can deliver practically unlimited output.

It's possible to use a hash function to construct a key derivation function. In fact, that's a very popular design. The basic idea is to concatenate the inputs together to form a string S, and emit an output stream of the form H(S || 0) || H(S || 1) || H(S || 2) || … where || is string concatenation and 0, 1, etc. are string encodings of the integer. As always with cryptography, you need to get the details right, and I won't go into all the details here. (With this basic idea, the main pitfall is that you must make the concatenations unambiguous.)

HKDF is a popular hash-based key derivation function which is based on this principle, with extra scaffolding to protect against some potential weaknesses of the underlying hash, and with an interface tailored for one secret input and two non-secret inputs (that's a pretty common interface). Beware that HKDF does have one pitfall: it applies some preprocessing to the salt that maps different salt values to the same intermediate value; to avoid this, always use salt of the same size for a given secret.

Do not use hash chaining: that's a bad way of constructing a key derivation function from a hash. If the output is H(S) || H(S||H(S)) || H(S||H(S||H(S))) || …, then it's possible to reconstruct the whole output from the first n bytes where n is the length of the hash. How bad this is depends on how you're using the output material, but even if it's not completely broken, it's less secure than it could be with the same level of complexity and performance.

If you have access to SHA3, you probably have access to SHAKE, which is a family of two extendable output functions (SHAKE128 and SHAKE256). An extendable output function can be used as a key derivation function whose input is a single string.


Note that a photo is rarely a good input for a key derivation. Trivial changes to the photo, such as a reencoding, compressions, changing metadata, etc. will make it completely different. So you need to save the exact photo file. And since you need to have this exact file, it doesn't really matter that it represents a photo. You might as well save the key. The only advantage of a photo is that it's more discreet than a random-looking file if you're under casual surveillance. (But as soon as someone takes interest in your device, they'll likely make a copy of all your files including that photo, so you should assume they have the key anyway.)

To protect against an adversary who obtains your photo library but doesn't know which photo you're using as a key, you can use key stretching. A key stretching function is a key derivation function that is constructed to be slow. The idea is that the slowness hurts someone trying all candidate inputs by brute force more than it hurts someone who knows the correct input. Some key stretching functions offer functionally unlimited output; many don't. If you pick one that doesn't, just use its output as the input to an ordinary key derivation function.

Key stretching is only a limited protection. In this scenario, it won't help much. For example, if it's acceptable for you that reconstructing the key takes 1s, and you want the adversary to spend a day of computer time (assuming their computer isn't more powerful than yours), then your library must have at least 86400 photos (so 173 GB of photos at 2 MB each).


As a final note, 2048 bits of key material is a weird size. Symmetric cryptography doesn't need keys that big. Some asymmetric cryptography uses larger private keys, but then the key is either even larger than 2048 bits or else it has some mathematical structure. For example, you don't get a 2048-bit RSA key by just taking a random 2048-bit string: you need a complex process that can consume much more than that from a pseudorandom generator. Of course you can use the 2048 bits as a seed for the pseudorandom generator, but then we're back to my earlier consideration about symmetric cryptography: a 256-bit seed would be plenty enough.

phantomcraft avatar
pf flag
You can think I'm crazy, but I really need a huge key size (in my case Kuznyechik with secret S-boxes), 2048-bits was just an example. But if the key size is bigger that the internal state or digest output size of underlying hash function?. Can HKDF using HMAC-SHA256 emit a 2048-bit key if the size of internal state of SHA-256 is only 256-bits?
phantomcraft avatar
pf flag
Just for curiosity, you wrote H(S || 0) || H(S || 1) || H(S || 2) || … Can S be the same for all hashings?
Gilles 'SO- stop being evil' avatar
cn flag
@phantomcraft The internal state can be exponentially shorter than the output. That's the whole point of symmetric cryptography: you go from protecting a small key to a very large amount of data. Yes, S is the same each time.
phantomcraft avatar
pf flag
Ah, I got. So, this user that replied one of my question about internal state of Argon2 is wrong? ==> https://crypto.stackexchange.com/a/96583/89581
Gilles 'SO- stop being evil' avatar
cn flag
@phantomcraft No, it's talking about different things. $n$-bit security means that it would take $2^n$ attempts to crack something by brute force. Once $2^n$ becomes larger than humanly available computing power, the notion becomes purely theoretical. 128 bits of security protects against an NSA-level adversary willing to spend about as long as the universe has existed running current-era computers. 256-bit security protects against quantum computers, in case they turn out to be possible. A SHA256-based KDF on a seed with 256 bits of entropy gives you 256 bits of security for any output size.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.