Score:7

Is there a form of cryptography where the key is derived from the plaintext

dz flag

Imagine you are building a shared remote storage system where you send everyones files to a central storage, but you want to de-dup the files between multiple users so we don't store the same file more than once. At the same time you want to encrypt the data so the service provider or anyone who doesn't have the file can't decrypt the file.

In this case you can imagine deriving a symmetric encryption key from the contents of the file (say by taking some cryptographic hash of the file) such that everyone who owns the file can compute the key easily, but people without access to the file can't discover it.

Then users encrypt the file with this key and send us the encyrpted blob. We can internally check against a dictionary of hashes of encrypted blobs to determine if we already have the blob or not and either store it or drop it.

I imagine that deriving a key from the plaintext is a terrible idea for most encryption schemes, but I'm wondering if there is a scheme out there, or if there is a known technique for doing this.

caveman avatar
in flag
If the goal is simply to de-dub, why not make the protocol such that the identifier of a file is its hash, and let that be totally separate about how users encrypt it? E.g. let users choose whatever passwords they like, but expect them to supply the hash of the plaintext along with the ciphertext?
dz flag
Sure, but I need to encrypt everyones file with the same key, without actually knowing the key, that's the part I'm interested in. How can my users share an encryption key for this blob? My idea is, the thing they all know is the contents of the file, if they already know the contents, then it's safe for them to decrypt the file. If they all choose different passwords, I wouuld have to store a copy for each of them since the ciphertext will all be different?
SAI Peregrinus avatar
si flag
"I need to encrypt everyones file with the same key," Why? That's a very weird requirement.
dz flag
If I use two keys, wouldn't I have to store the encrypted blob twice? I don't want to do that. To be clear, I mean the same file so if alice has foo.mp3 and bob has foo.mp3, I want both of them to upload the same encrypted blob so I can realize they are the same and only store one copy, but without the central service being able to know what is in the file.
caveman avatar
in flag
Do you want users to give you ability to decrypt their files which you host for them?
id flag
If one chooses a hash such that no two files will have the same hash, and no two hashes would hash to the same hash, one could encrypt each file using a single hash of the contents, and then identify the file on the storage medium using a hash of the hash. Each user would need to have a copy of the single hash that's encrypted using their individual private key. Someone without a copy of the hash would be unable to decrypt the file without being able to reverse the second hash, which if the hash is good should be completely intractable except by guessing the complete file contents.
id flag
As others have noted, this involves giving up some kinds of privacy guarantees, since it would be possible to tell whether a user had a file with particular contents; there may be ways to somewhat obscure such information, but if the act of a user uploading a 20-megabyte file doesn't reduce available disk space by 20 megabytes, that would tend to suggest that the file already existed.
cn flag
@Matt, I dont see why the method you suggested would not work. Perhaps the key would be a SHA256(file), along with a salt which maybe SHA256(SHA256(file)). And then, they could store the file with you and send the url to the file along with the SHA(256) of the file. Note that you could generate different urls for different senders so that they can "delete" the file from the server. You'd just delete their url but can keep the underlying file as long as there are no urls still pointing to it. If a 256 hash is not long enough, you could concat other hashes and salts. Probably not needed.
cn flag
The tricky part of course, is to convince the file "owner" that the storage provider (you) are really not able to decrypt the file though you're providing the algorithm and probably, the library and Ux to encrypt ! Takes trust or some technical authority figure to bless your method !
Score:16
us flag

Encrypting $M$ using $H(M)$ as the key is a natural and well-studied approach to deduplication. It is known in the literature as convergent encryption or message-locked encryption.

The natural problem with this approach is that it cannot achieve the standard notions of security for encryption (IND-CPA, IND-CCA, etc). Indeed, anyone who knows $M$ will be able to verify whether a candidate ciphertext decrypts to $M$. The question then becomes: what is the "best possible" level of security that one could hope for, and can we achieve it?

The most thorough analysis of message-locked encryption that I know of is:

Mihir Bellare, Sriram Keelveedhi, Thomas Ristenpart: Message-Locked Encryption and Secure Deduplication, Eurocrypt 2013.

They define relevant security notions, and give corresponding constructions & proofs. They also consider some other security goals closely related to the deduplication application, separate from standard secrecy and integrity.

You can search ePrint and find many other papers with "message-locked" or "deduplication" (for other approaches) in the title. Note that message-locked encryption is non-interactive, but the problem of deduplication becomes a little easier when you allow interaction. Hence, many of the techniques that you find may be interactive.

caveman avatar
in flag
After removing duplications of different encryptions of the same cleartext, will distinct users be able to decrypt the unique deduplicated copy by using their own password? As far as I understood, with OP's scenario, different users upload copies of the same file, except that each is encrypted by user's own password.
us flag
If $M$ is encrypted with $H(M)$ then it is not encrypted with the user's password. If you want a user to be able to decrypt, then you must store $H(M)$ encrypted under that user's password. Note that this is a very short ciphertext.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.