I was doing some reading today about a major tech company planning to implement a new system for automatically detecting and reporting CSAM in users' photos. Overall, the system as described in their 12-page technical summary seems to be designed quite well, and may be as close as you can get to true privacy, while still allowing for content surveillance.
That being said, the hacker in me can't help but feel a little alarmed when it hears about exceptions to what could otherwise be end-to-end encryption (not that their photo storage is advertised as end-to-end encrypted to begin with, however their technical overview does say that all of the photos are encrypted with a—threshold breakable—key randomly generated by the user's device). Therefore, I came here to outline what I see as the most realistic attack on the cryptographic strength/privacy guarantees of this system, and to (hopefully) learn why I am wrong or what I have overlooked.
Let's say that this company ever suffers a data breach: an unlikely situation to begin with, but not unheard of. As a result of this data breach, many users' photos (in encrypted format) are leaked. If true end-to-end encryption were in place, this would not be a major privacy concern, as all photos would be encrypted with a key known only to the end users' devices, and therefore would not be realistically decryptable by anyone on the internet.
In this new system, however, it is my understanding that photos, or at least their visual derivatives (which I could not find a definition for though I'm assuming is similar to thumbnails), are encrypted twice, with the outer layer being encrypted by a key derived from the NeuralHash of the photo.
NeuralHash is described as a hashing algorithm capable of providing the same hash for the same image, even after that image has undergone cropping, resizing, color adjustments, compression, etc.
To quote part of the Technical Summary:
The main purpose of the hash is to ensure that identical and visually similar images result in the same hash, and images that are different from one another result in different hashes. For example, an image that has been slightly cropped or resized should be considered identical to its original and have the same hash.
This is great in theory, because it means that all (presumably unique) photos taken by users will be encrypted with strong, unique secrets, keeping them private and secure.
But, what happens when a user stores a photo that isn't unique? For example a screenshot from a popular website, a meme circulating the internet, etc.? What's to stop an attacker from generating a NeuralHash of popular memes, deriving a key, then bruteforcing the leaked data until it successfully decrypts an entry, thus verifying the contents within a specific user's cloud photo library, and degrading their level of privacy?
Or, for another example, let's say the attacker loves apples, and really, really wants to find photos of apples. What's to stop them from having an AI generate a few million photos of an apple, hashing them, deriving keys, and then bruteforcing the presumably large leak until it finds a match? There can't be that many permutations of an apple, can there? Like sure, you're not going to find all of the apple photos, but I would think that you'd be able to at least get some decryptable matches.
This company itself even reveals in one of its papers that there is a non-zero chance of false positives when it comes to matches, and that they've therefore introduced threshold secret sharing (i.e. needing to have multiple matches to their "known-bad" database before their inner level of encryption can be broken... more on that next), to reduce the chance of false positives down to one in a trillion. A significantly less than a one in a trillion chance of having a false positive match on any, given photo sounds within bruteforceable range to me, especially if you already know what type of photo you're looking for.
On a final note, there is an inner layer of threshold encryption which basically requires that the outer layers of multiple photos be decrypted before the key to decrypt the inner layer can be constructed. But once again, depending on the threshold size (which must be quite low, as it needs to be less than a realistic amount of CSAM that someone could have), it doesn't seem like a large obstacle: you just need to find a user who has, say, ten common memes stored in their entire cloud photo storage library, and you've now constructed that key. According to the paper, that same key is used across all of a user's photos for that first layer of encryption.
At the end of the day, I see the security and privacy guarantees of this system in the event of a data breach hanging onto one, main thing: the NeuralHash.
If the NeuralHash has a high-enough false positive rate, and can be reverse engineered or gets leaked or is made public (if it isn't already), then can this major tech company truly guarantee its users that their private photos will unconditionally remain private, as long as they're not CSAM? What cryptographic protections have I overlooked, that make an attack like the one I described above impossible? What am I missing? Do you see any other potential flaws?
Update: I was not sure if it were considered acceptable or not to specifically name the company, so I decided to err on the side of caution and not do so. That being said, I did see a few comments asking for the source, so here it is. I hope this helps!
Moderator addition (2021-08-19): There are technical details in Abhishek Bhowmick, Dan Boneh, Steve Myers: Apple PSI System - Security Protocol and Analysis. It's one of several documents now linked at the bottom of this page.