Score:2

Does encryption also hide file hash?

jp flag

I have files in my computer that are being uploaded to a cloud storage provider (Sync) that specifies that files are encrypted on the client's side, then uploaded to their servers, and that only has the key to decrypt the files. So files can only be decrypted on my computer.

Now my question is:

Do encryption hide file name and hash?
Can such cloud providers identify the files by their names or hash if they are encrypted?

I know identifying files like this is a common process to avoid file duplication (Dropbox does this for example), so I am wondering if by encrypting files they lose this ability.

Not planning on having anything illegal (on purpose) in those folders (ie. music, movies, software), so I'm not worried in that sense.

BUT, my sister's account in Dropbox was just deactivated for "breach of terms" and she didn't have anything illegal there either "that she knows of", I say this because she had whatsapp backing up images to a folder that was being synchronised to the cloud, also family photos of toddler nephews (sometimes innocent photos or them having a bath) so not sure if any meme or joke images from group chats was flagged or maybe some random file having virus (this is also specified in their terms as a possible breach of terms)... Dropbox doesn't specify and they don't comment on why they deactivated your account "for legal reasons" WTF! So you're left wondering what you did wrong... if anything at all.

So, I decided to leave Dropbox too (specially because I had offline files there that were not being sync'ed to my computer, so have to buy an extra HDD to fix that now!) and I am trying to understand how safe my files would be with this new provider I intend to use (Sync), as they say files are encrypted on my end and uploaded to their servers encrypted.

Don't know... I think maybe encrypting files on my end with another software is an overkill in this situation, just wanted to know at what extent my files are private. Maybe I'm getting a little bit paranoid here - or maybe not - I'm no CIA agent, but you know, here in Australia things are going crazy. People have been taken by the police from their own home for things they have posted in Social Media that go against the current official narrative (virus related), so... who knows!.

Anyway, I know my question is a bit long but just wanted to add some context.

Thanks for your comments!

Score:1
es flag

If you are concerned about file hashes leaking, then even if filenames are encrypted you should probably also be concerned about file sizes leaking and directory structures leaking. For example, if you have a directory containing a dozen MP3s, it is highly likely that based on only the set of file sizes in that directory that the album you have encrypted can still easily be identified by the cloud storage provider.

The easiest practical solution therefore is to store your files in a mounted encrypted volume, and to only sync the encrypted volume file to cloud storage.

This solution still isn't perfect, however. If you modify the encrypted volume by adding a single file to it, then the cloud provider will know that only a certain range of bytes within the encrypted volume have been modified. Therefore the sizes of individual files may still leak to the cloud provider if you add them individually to the encrypted volume after the volume has first been uploaded.

Ozzie Nano avatar
jp flag
Thank you for your reponse @knaccc Your comment shed some light. Thank you :)
Ozzie Nano avatar
jp flag
Just edited my question adding some more context. Letting you know just in case you're curious about my situation :)
Score:1
in flag

Does encryption hide file name and hash?

Encryption is performed on the contents of the file, not on the name of the file. After the encryption, you need to save it in a file. Choose whatever name is appropriate. If you want to keep the original filename secret then, prepend it to the plaintext with a proper delimiter so that during decryption, you can get the file name correctly.

Or simply create a Zip archive with a different name, or better use a small Veracrypt volume.

hash; you mean that you also created the hash of the file. Hash is also a file, the same applies as above. Append the hash at the end of the plaintext

plaintext =  file name | delimiter | contents of the file | delimiter | hash of file

Your biggest problem with this is how the user will correspond the ciphertext files to original files. Do they need to decrypt and check every file? VeraCrypt solution can be better especially since the files are small.

Does encryption also hide file hash?

If you encrypt a file, then one can no longer find the hash of the unencrypted file.

Can such cloud providers identify the files by their names or hash if they are encrypted?

Assuming the hash is not encrypted, stored together with the encrypted file.

Of course, they can identify some, especially they are also in the public domain. Microsoft build a system (PhotoDNA)) so that they make a hash of every image on a system and compare them on their database and warn the official if a sensitive image is found. It is a service that works. Hash leaks information about the file!

As you can see, the check is based on the availability of the file on the adversary.

The filename may indicate some information about the contents of files. Keep your tracks as minimum as possible!

If you encrypt the file and their hashes with probabilistic encryption and choose a new key for every update they will have no idea about the contents, assuming the file names are encrypted as above. Still, the VeraCrypt volume beats.

Your choices for encryption start from AES-CBC, AES-CTR which has IND-CPA security to AES-GCM, AES-GCM-SIV, xChaCha-Poly1305 i.e. our modern authenticated encryption modes.

Ozzie Nano avatar
jp flag
Wow! Thank you very much for your comprehensive response @kelalaka :) Your input is very useful!
Ozzie Nano avatar
jp flag
Just edited my question adding some more context. You have given me a lot of info to think about so no need to comment on my edit, but just in case you were interested in knowing some more about my situation :)
Ozzie Nano avatar
jp flag
Quick question: I just logged to Sync via web, navigated to folders that contain images, and I see there that thumbnails are displayed for JPGs, PNGs, etc. I wonder how can they create Thumbs for these ind of files if the files are automatically encrypted on client's side as they say... and they have no access to the contents of their servers. What am I missing?
kelalaka avatar
in flag
Are you sure they create thumbs of the encrypted files? Who encrypts them? Who access the keys?
Ozzie Nano avatar
jp flag
The cloud provider I am using is called "Sync". When using their web interface to navigate my files, if I go to a folder that contains images, and change the view from list to icon, the images are displayed. According to their whitepaper, they use Zero-knowledge, end-to-end encryption. https://www.sync.com/pdf/sync-privacy-whitepaper.pdf
kelalaka avatar
in flag
I'm in no position of reviewing your problem. Use WireShark to investigate some of them with file access tools.
Ozzie Nano avatar
jp flag
That's OK. I was just curious about the technology used and how it worked. I will check WireShark as I never heard about that before. Thank you for your time :)
kelalaka avatar
in flag
You should be careful, who access ( local file access tool )and when sent when/who/port and to where with Wireshark. Now at least you can upvote...
Score:0
in flag

Different services behave differently. If you backup files one by one you will for example preserve file sizes. When the original file is compressed we get highly unique file sizes.

Take several files which typically appear together and a collection of file sizes would be sufficient to identify this collection of files even if they are encrypted under different names.

We can mitigate this by combining several files together before encryption, and/or padding the length significantly. For snapshots this is fairly easy, put everything in a big file or broken up to fixed size. But with incremental backup this is harder.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.