The concept you're looking for is a key derivation function. A key derivation function takes one or more inputs and produces a stream of bytes such that:
- The output is pseudorandom: given part of the output, it's infeasible to guess the rest, unless you manage to find the inputs.
- Given the output, it's infeasible to guess the input except by trying candidates and comparing the given output with output from the candidates.
- Outputs for different inputs are independent: knowing the output for some inputs doesn't help to guess the output for a different input.
Some key derivation functions have a limited output size, while others can deliver practically unlimited output.
It's possible to use a hash function to construct a key derivation function. In fact, that's a very popular design. The basic idea is to concatenate the inputs together to form a string S
, and emit an output stream of the form H(S || 0) || H(S || 1) || H(S || 2) || …
where ||
is string concatenation and 0
, 1
, etc. are string encodings of the integer. As always with cryptography, you need to get the details right, and I won't go into all the details here. (With this basic idea, the main pitfall is that you must make the concatenations unambiguous.)
HKDF is a popular hash-based key derivation function which is based on this principle, with extra scaffolding to protect against some potential weaknesses of the underlying hash, and with an interface tailored for one secret input and two non-secret inputs (that's a pretty common interface). Beware that HKDF does have one pitfall: it applies some preprocessing to the salt that maps different salt values to the same intermediate value; to avoid this, always use salt of the same size for a given secret.
Do not use hash chaining: that's a bad way of constructing a key derivation function from a hash. If the output is H(S) || H(S||H(S)) || H(S||H(S||H(S))) || …
, then it's possible to reconstruct the whole output from the first n bytes where n is the length of the hash. How bad this is depends on how you're using the output material, but even if it's not completely broken, it's less secure than it could be with the same level of complexity and performance.
If you have access to SHA3, you probably have access to SHAKE, which is a family of two extendable output functions (SHAKE128 and SHAKE256). An extendable output function can be used as a key derivation function whose input is a single string.
Note that a photo is rarely a good input for a key derivation. Trivial changes to the photo, such as a reencoding, compressions, changing metadata, etc. will make it completely different. So you need to save the exact photo file. And since you need to have this exact file, it doesn't really matter that it represents a photo. You might as well save the key. The only advantage of a photo is that it's more discreet than a random-looking file if you're under casual surveillance. (But as soon as someone takes interest in your device, they'll likely make a copy of all your files including that photo, so you should assume they have the key anyway.)
To protect against an adversary who obtains your photo library but doesn't know which photo you're using as a key, you can use key stretching. A key stretching function is a key derivation function that is constructed to be slow. The idea is that the slowness hurts someone trying all candidate inputs by brute force more than it hurts someone who knows the correct input. Some key stretching functions offer functionally unlimited output; many don't. If you pick one that doesn't, just use its output as the input to an ordinary key derivation function.
Key stretching is only a limited protection. In this scenario, it won't help much. For example, if it's acceptable for you that reconstructing the key takes 1s, and you want the adversary to spend a day of computer time (assuming their computer isn't more powerful than yours), then your library must have at least 86400 photos (so 173 GB of photos at 2 MB each).
As a final note, 2048 bits of key material is a weird size. Symmetric cryptography doesn't need keys that big. Some asymmetric cryptography uses larger private keys, but then the key is either even larger than 2048 bits or else it has some mathematical structure. For example, you don't get a 2048-bit RSA key by just taking a random 2048-bit string: you need a complex process that can consume much more than that from a pseudorandom generator. Of course you can use the 2048 bits as a seed for the pseudorandom generator, but then we're back to my earlier consideration about symmetric cryptography: a 256-bit seed would be plenty enough.