Score:17

How Unique is a "NeuralHash"?

gn flag

I was doing some reading today about a major tech company planning to implement a new system for automatically detecting and reporting CSAM in users' photos. Overall, the system as described in their 12-page technical summary seems to be designed quite well, and may be as close as you can get to true privacy, while still allowing for content surveillance.

That being said, the hacker in me can't help but feel a little alarmed when it hears about exceptions to what could otherwise be end-to-end encryption (not that their photo storage is advertised as end-to-end encrypted to begin with, however their technical overview does say that all of the photos are encrypted with a—threshold breakable—key randomly generated by the user's device). Therefore, I came here to outline what I see as the most realistic attack on the cryptographic strength/privacy guarantees of this system, and to (hopefully) learn why I am wrong or what I have overlooked.


Let's say that this company ever suffers a data breach: an unlikely situation to begin with, but not unheard of. As a result of this data breach, many users' photos (in encrypted format) are leaked. If true end-to-end encryption were in place, this would not be a major privacy concern, as all photos would be encrypted with a key known only to the end users' devices, and therefore would not be realistically decryptable by anyone on the internet.

In this new system, however, it is my understanding that photos, or at least their visual derivatives (which I could not find a definition for though I'm assuming is similar to thumbnails), are encrypted twice, with the outer layer being encrypted by a key derived from the NeuralHash of the photo.

NeuralHash is described as a hashing algorithm capable of providing the same hash for the same image, even after that image has undergone cropping, resizing, color adjustments, compression, etc.

To quote part of the Technical Summary:

The main purpose of the hash is to ensure that identical and visually similar images result in the same hash, and images that are different from one another result in different hashes. For example, an image that has been slightly cropped or resized should be considered identical to its original and have the same hash.

This is great in theory, because it means that all (presumably unique) photos taken by users will be encrypted with strong, unique secrets, keeping them private and secure.

But, what happens when a user stores a photo that isn't unique? For example a screenshot from a popular website, a meme circulating the internet, etc.? What's to stop an attacker from generating a NeuralHash of popular memes, deriving a key, then bruteforcing the leaked data until it successfully decrypts an entry, thus verifying the contents within a specific user's cloud photo library, and degrading their level of privacy?

Or, for another example, let's say the attacker loves apples, and really, really wants to find photos of apples. What's to stop them from having an AI generate a few million photos of an apple, hashing them, deriving keys, and then bruteforcing the presumably large leak until it finds a match? There can't be that many permutations of an apple, can there? Like sure, you're not going to find all of the apple photos, but I would think that you'd be able to at least get some decryptable matches.

This company itself even reveals in one of its papers that there is a non-zero chance of false positives when it comes to matches, and that they've therefore introduced threshold secret sharing (i.e. needing to have multiple matches to their "known-bad" database before their inner level of encryption can be broken... more on that next), to reduce the chance of false positives down to one in a trillion. A significantly less than a one in a trillion chance of having a false positive match on any, given photo sounds within bruteforceable range to me, especially if you already know what type of photo you're looking for.

On a final note, there is an inner layer of threshold encryption which basically requires that the outer layers of multiple photos be decrypted before the key to decrypt the inner layer can be constructed. But once again, depending on the threshold size (which must be quite low, as it needs to be less than a realistic amount of CSAM that someone could have), it doesn't seem like a large obstacle: you just need to find a user who has, say, ten common memes stored in their entire cloud photo storage library, and you've now constructed that key. According to the paper, that same key is used across all of a user's photos for that first layer of encryption.

At the end of the day, I see the security and privacy guarantees of this system in the event of a data breach hanging onto one, main thing: the NeuralHash.

If the NeuralHash has a high-enough false positive rate, and can be reverse engineered or gets leaked or is made public (if it isn't already), then can this major tech company truly guarantee its users that their private photos will unconditionally remain private, as long as they're not CSAM? What cryptographic protections have I overlooked, that make an attack like the one I described above impossible? What am I missing? Do you see any other potential flaws?


Update: I was not sure if it were considered acceptable or not to specifically name the company, so I decided to err on the side of caution and not do so. That being said, I did see a few comments asking for the source, so here it is. I hope this helps!


Moderator addition (2021-08-19): There are technical details in Abhishek Bhowmick, Dan Boneh, Steve Myers: Apple PSI System - Security Protocol and Analysis. It's one of several documents now linked at the bottom of this page.

Noah avatar
gn flag
@fgrieu The fact that they're not perfectly achieved is the flaw that I'm trying to highlight, and the reason why I raise the concern of the encryption of users' photos being potentially bruteforceable. In regards to the "same hash" not being an equal hash, wouldn't it have to be an equal hash? If the company is to be able to successfully decrypt CSAM images for manual review, they need to have the equal hash to derive the decryption key from, right? Also, on page 5 they show two example images which are different, yet similar in appearance, and are labeled with the same NeuralHash value.
fgrieu avatar
ng flag
I don't see that (inevitable) small imperfections of NeuralHash would invalidate the claims of privacy, if the (plausible) claims made about the low rate of false positives hold. It seems such imperfections would mostly lead to increased probability of non-detection. Obviously, altering the software doing the analysis, or the database of blacklisted hashes, can lead to privacy breach. I reserve my judgement about what else can.
Noah avatar
gn flag
@fgrieu The company says that "the threshold is selected to provide an extremely low (1 in 1 trillion) probability of incorrectly flagging a given account." Let's say that the threshold is 9, that means that there's around a 1 in 100 billion chance that a random photo uploaded by a user would be incorrectly flagged as CSAM (a false positive occurring). It seems to me that with a large enough collection of leaked data (or access to the data by a government, perhaps), and using AI to generate a whole bunch of *targeted* photos, *unknown* users' photos could be realistically bruteforced.
th flag
When I take photos particularly with people, I tend to take a burst of photos and later identify the best of those. I'm guessing that the NeuralHash of those photos would be the same (if the hash is stable against cropping, this would be similar). I wonder does the same NeuralHash contribute to the threshold decryption scheme.
Tom K. avatar
cn flag
Could you explain why the quoted part of the technical summary implies anything about the secrets that are used? I have read this part and the following one several times now and I really am not sure how it should work. When you say "What's to stop...?". Could you elaborate on the procedure an attacker would have to go through?
Noah avatar
gn flag
@johnflan That's a really good question! I'm *hoping* that the company would take that into account, otherwise a single false positive in the form of a burst photo could in and of itself get accounts potentially banned... depending on the procedure and accuracy around manual reviews.
Noah avatar
gn flag
@TomK. Very oversimplified, photos are encrypted with their NeuralHash. Therefore if an attacker can guess what the photo should look like, they can generate a whole bunch of photos until they eventually generate one that's similar enough to produce the same NeuralHash. They can then use that same NeuralHash to decrypt the original photo. So for example if someone wants to scan leaked data for nudes, they can use AI to generate thousands of fake nude photos, then try to use those hashes to decrypt the leaked data until they can decrypt someone's actual nude. Did that answer your question?
G. Maxwell avatar
fr flag
They aren't encrypted with the NeuralHash, the NeuralHash is used in a private set intersection. See my answer below!
Tom K. avatar
cn flag
You should really emphasize this part. Pictures are not encrypted with their hash as a key. That would be news to me.
lindsaymacvean avatar
is flag
This ycombinator news link https://news.ycombinator.com/item?id=28105849 leads to a great bit of code showing a proof of concept brute force attack for algorithms similar to NeuralHash https://gist.github.com/unrealwill/c480371c3a4bf3abb29856c29197c0be
Score:9
fr flag

Private set intersection protocols are interactive. You compare against a specific database, and then after the protocol is completed the only thing learned in the intersection. If the intersection is empty than nothing is learned. Assuming the protocol is properly constructed a saved transcript wouldn't allow anything to be learned later.

Your repetition of Apple's claim that it protects privacy, however, seems to be misguided. The cryptographic protocol here is absolutely not serving the purpose of the user's privacy. The user's privacy would be better protected by the obvious naive scheme of Apple sending the user the list of disallowed hashes and the client software telling on the user if there is a match: already the system critically depends on the user's system acting against their interest otherwise the user's system could just use random dummy hashes for everything, so Apple might as well just send the list. Sending the list would also be massively more bandwidth and CPU efficient.

Instead, the purpose of the private intersection then isn't to protect the user's privacy: it's to shield Apple from accountability by making it impossible for the users to learn or prove anything about the content of the database. It protects Apple's privacy against you and the public at large.

Might the database contain popular images, as you suggest? Or images identifying you as member of a particular religion, ethnicity, or political ideology-- ready to be targeted for genocide? As someone sharing political cartoons that criticize a thin-skinned dictator? Might the list be secretly different for different users or over time? Might hackers or state actors that compromise Apple infrastructure or the list originators themselves secretly be altering lists? Absolutely, and the cryptography is in place to make it information theoretically impossible (not just computationally intractable) for you discover or prove those kinds of abuses.

Or at least the most obvious and easily constructed ECC PSI schemes have information theoretic security, they don't give enough technical details to know if theirs does.

The typical construction of a private set intersection would have you construct a polynomial with the root at your image hash. Apple would send you a list of EC-elgamal encryption of the database hashes. The encryption is additively homomorphic: You can take a ciphertext and add another ciphertext and the result is a valid encryption of a sum of the plaintexts. You can also take a ciphertext and multiply it by whatever value you want and the result is a valid encryption of the plaintext multiplied by that number.

You would evaluate your image polynomial at each database point and then multiply the result by a new random number (different for each evaluation)-- which you can do using the above mentioned addition and multiplication properties. If there is a match, your evaluation results in an encryption of zero and your multiplication preserves the encryption of zero. When you send the results back they 'decrypt' and get either zeros (where there are matches) or random numbers. It's pretty straight forward to extend this sort of scheme to attach auxiliary data that Apple only learns when there are matches-- which is presumably how they get the decryption data across.

In any case this kind of cheating would have to be done in real-time-- not after the fact on logged data, at least if the private set intersection is correctly constructed.

Noah avatar
gn flag
Thank you so much! This is an extremely well-written answer, and I am now inspired to go and learn more about Private Set Intersections! I'll leave the question open a bit longer, but expect I'll be accepting this answer soon. Thanks again! :)
G. Maxwell avatar
fr flag
FWIW, I constructed a number of visually attractive arbitrary preimages for neuralhash: https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1#issuecomment-903094036
Score:5
ng flag

The question's quote about NeuralHash states:

The main purpose of the hash is to ensure that identical and visually similar images result in the same hash, and images that are different from one another result in different hashes. For example, an image that has been slightly cropped or resized should be considered identical to its original and have the same hash.

This is Lie-to-children. It's impossible to achieve what's stated as purpose. An essential assumption is left unstated: that the images are not crafted to defeat the purpose. And even then there will be exceptions.

More precisely, it can be crafted

  1. False negatives: Unavoidably, it's easy to exhibit visually similar images with different NeuralHash. Proof: given any two visually different images $A$ and $B$, the technique known as morphing creates a list of $n$ images $C_i$, with $C_1=A$ and $C_n=B$, such that for all $i$ in $[1,n)$ the images $C_i$ and $C_{i+1}$ are visually similar, to an arbitrary degree. If any pair of visually similar images had the same NeuralHash, $A$ and $B$ would have the same NeuralHash (because equality is transitive), contradicting a stated goal. Thus the equality of NeuralHash for $C_i$ and $C_{i+1}$ is broken for some $i$.

    Reportedly, there's even an artifact¹ making NeuralHash sometime yield different NeuralHashes for the exact same input¹ on different machines!

  2. False positives: Reportedly, it's feasible to exhibit two visually different images with exactly the same NeuralHash. Worse, starting from a cat image, and the NeuralHash of a dog image, it's reported feasible to craft an image that has the NeuralHash of the dog image, but to a human viewer is clearly the cat image (OK, displayed with the wrong video drivers, but as the saying attributed to the NSA goes: “Attacks always get better; they never get worse.”); and importantly, the dog image itself is not necessary.

    This is the perceptual hash equivalent of a first and second preimage attack, thus also a collision. It's input consists of a NeuralHash, and an unrelated target image to perceptually mimic. Feasibility of such attack makes it necessary to keep the list of blacklisted NeuralHashes secret: there's no other technical barrier preventing the many opponents of the system from overwhelming it under false positives, without needing to manipulate prohibited images.


It would seem plausible to mostly repair 1 (and the reported implementation artifact) by replacing “same” with “similar within a tolerance”, on the tune of: differ by at most $b$ bits. However I don't think that's done, because the Threshold PSI-AD and Fuzzy threshold PSI-AD protocols envisioned perform exact hash matches. I believe this exact match constraint will make it impossible to repair NeuralHash false positives under adversarial assumptions, while maintaining that non-adversarial minor visual alterations seldom change the NeuralHash.

Technically circumventing the system using NeuralHash would be easy: I've not seen is claimed, but I would not be surprised if it was feasible to change an image, with hardly any visual alteration, into one with a NeuralHash differing by a few (random) bits, thus avoiding detection. I conjecture this end result can also be reached by sticking prohibited images side by side into a larger one. And it's certainly reached by application-level encryption or/and steganography.


Conclusion

NeuralHash does not work as advertised under standard assumptions in cryptography and IT security: that it's public, and under attack. Which is likely.

Thus the automated censorship-at-home system using NeuralHash can't be robust. It is easily circumvented; and it is vulnerable to some degree of Denial-of-Service with false positives if the blacklisted NeuralHashes leak. It could nevertheless be successful at reporting presence of some prohibited material, and shielding it's promoters from legal responsibility.


¹ It's conjectured this is due to details on floating point. The one example reported has 9 bits out of 96 that differ, with no obvious relation between their position. That looks like a lot of discrepancy to me. Perhaps the report is wrong, or for a pre-release version of NeuralHash, or for some pathological input. Or the algorithm is numerically unstable, a known issue with deep neural network algorithms obtained by training: it's hard (and usually not attempted) to analyze how they actually work. If they happen to have reached their goal thru a function with much input/output gain, or depending on some underflow or truncation, the algorithm can be degraded when some FPU detail changes, must like a function iterating the logistic map can be.

Score:2
fgrieu avatar
ng flag
It's not only a collision. It's a targeted preimage. See 2 in [my answer](https://crypto.stackexchange.com/a/93631/555).
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.