How to require possession of arbitrary data to decrypt a file?

825480793

6/22/24, 10:02 PM

I have pairs of files A and B both of arbitrary size and contents (for any pair of files, A could be significantly larger than B or vice versa). I want to be able to 'encrypt' A in such a way that I can distribute it openly, but so that only people already in possession of the entirety of B can access any part of A regardless of the contents or size of either file (disregarding things like compression or cases where B is trivial enough to be guessed). I don't mind distributing additional data C in case it's necessary for a scheme to function, but I cannot affect the contents of B. It must not be possible to recover any part of B from the encrypted A.

I have a strong feeling this is not possible, but I thought I'd ask before giving up. I have only extremely rudimentary knowledge of cryptography and entropy, so I apologize if this is a very silly question. One alternative I'm aware of is moving verification to distribution itself where Proof of Possession of B can be required before allowing access to A, but this wouldn't be optimal for my purposes.

2 + 0

Score:0

Crypto

aiootp

6/23/24, 3:04 AM

Using the hash of the file $B$ to create an encryption key is a simple way to prove knowledge of its contents before providing decryption access to the file $A$, but there are some important details to consider.

Issues:

Say someone has access to a large portion of $B$ but not the whole thing. The entirety of $B$ may be unguessable, but if some of $B$ is known, the unknown portion may be short or predictable enough to be guessed. If so, a correct guess could be confirmed by seeing if a key derived from the known part of $B$ and a guess for the remaining part $G_i$ succeeds in decrypting the file $A$. It may be very efficient to make guesses $K_i = H(B \space || \space G_i)$ which then lead to a sensible decryption result. This would violate your requirement that no part of $B$ can be recovered from the encrypted $A$.
The use of a plain hash function, instead of a construction like HMAC, would leave the system open to known vulnerabilities. So let's assume from this point forward we'll use something like HMAC. Relying on the unguessability of $B$ for the confidentiality of $A$ seems dangerous, since there's no guarantee that an arbitrary file will by itself contain an adequate amount of entropy to derive key material.
There should be some context information, and preprocessing of inputs, used along with a hash function to prevent issues where derived keys are reusable in illegitimate contexts.

Potential Solutions:

Require someone in possession of $B$ (a prover) to prove complete knowledge of $B$ before you (a verifier) give access to information $C$ which is used to derive the encryption key $K_A$ for file $A$. Below is one possible way to do this:

$K_A = HMAC('ContextForEncryptionKeyForFileA' \space || \space B, \space key=C)$

$W_{prover} = HMAC('ContextForProverChallengeBA' \space || \space T_{prover} \space || \space I_{prover} \space || \space B, \space key=R_{prover})$

The prover sends the verifier a timestamp $T_{prover}$ with adequate granularity, a random 32-byte token $R_{prover}$, optionally some identity information $I_{prover}$ like a username or even a public key, and the hash result $W_{prover}$.

The verifier can then check to see if the timestamp is valid: $T_{verifier} > T_{prover}$; and that it was created recent enough: $T_{verifier} - T_{prover} < Threshold$. Then it may be possible for the verifier to ensure the prover has adequate permission to request decryption access to $A$ using the identity information $I_{prover}$. If everything checks out, the verifier then checks to see if $W_{prover} = W_{verifier}$ using a timing-safe comparison and its own knowledge of $B$.

If the verifier's results are the same as the prover's, then the prover is given the key $C$, which could be 32 random bytes. If not, then the connection is terminated, and the verifier could keep track of how many failed attempts for decryption access to file $A$ were initiated by the prover associated with identity $I_{prover}$.

Final Comments:

This is an incomplete overview of a potential protocol which you could use to solve your problem.

Very importantly, it's missing the specifics for how the communication channels between prover and verifier are secured and authenticated. Similarly, if $I_{prover}$ is used to implement access controls, it would be important to also verify the prover is the legitimate entity associated with that identity.

In regards to the use of hash functions and $HMAC$. It's prudent to not just concatenate inputs together, but to encode them prior to hashing so as to mitigate canonicalization attacks. One simple way this is done is to count the number of inputs, measure the length of each input in order, then concatenate the inputs and prepend the metadata you just calculated. This helps ensure no potentially exploitable, or error inducing, unintended significance can be attributed to the inputs.

+ 0

Score:0

Crypto

poncho

6/22/24, 10:17 PM

Easy enough:

Take B and hash it with a cryptographically secure hash function; for example, K := SHA256( B )

Background: a hash function takes an arbitrary length string, such as your file B, and converts it into a short (256 bit) string in such a way that someone without guessing B can guess what that string is [1]. There are a number of secure hash functions; SHA-256 is a popular one.

Take A and encrypt it with the value K; for example, Ciphertext := AES-GCM(K, A)

Background: encryption converts a file (such as A) into a form such that, without knowledge of the key (K), no one can get any information about A (other than its size - you could disguise that by, say, padding all files to the maximum length; that's generally costly enough that we don't bother). There are a number of secure encryption algorithms, AES-GCM is a popular one.

Most crypto packages should give you access to both hashing and encryption based on a key.

[1]: Actually, that's not how a cryptographer would explain it; it has a number of security properties beyond that; however that's what you need.

+ 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: How to require possession of arbitrary data to decrypt a file?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.