Score:0

Hash Comparison to Detect Ransomware File Encryption

br flag

As detailed in a separate question, I thought I had a way to detect the type of ransomware that encrypts files silently, and then decrypts them on the fly, so as to prevent the user from realizing that the files have been encrypted. I thought that a comparison of present vs. past file hashes would detect file changes: if many files were unexpectedly changed, maybe those changes were due to ransomware encryption.

A comment on that question seems to say that my concept fails because a file must be read in order to be hashed. The ransomware would make the file's contents available to the hashing tool; that tool would find that the contents appeared unchanged; therefore I would get the same hash value as before.

I don't understand that. It seems I need to address it in this separate question. If hashing only takes account of the file's contents, wouldn't it be impossible to hash, say, a file that the user has securely encrypted?

A Cryptography discussion seems to say that hash values, for a file, may vary according to the timing of encryption with a public key. I interpret that as meaning that variations in the encryption process can produce variations in hash values. That seems incompatible with a general claim that hashing would not detect any difference between an encrypted file (even if decrypted on the fly) and its previously unencrypted form.

What am I missing here?

DannyNiu avatar
vu flag
In any case, PUoSU. Demonstrate your idea/finding by implementing it.
br flag
I did. The other question links to the writeup.
Score:3
kr flag

I don't understand that

If your system is infected, there is no guarantee that you read the real file contents, as it is stored on the disk. It can be that the file is encrypted by ransomware. When you request an application to read it, it calls operating system. If ransomware infected the system, it will read the encrypted contents, decrypt it, and provide to the OS and thus to your application. As long as you use an infected system you cannot know what is the real contents on the disk.

The only reliable way to detect encryption is to read files using other system. Boot from USB stick, create hashes of the files, repeat it time to time, e.g. daily or weekly. Of course, it differs from your desire to detect changes immediately.

wouldn't it be impossible to hash, say, a file that the user has securely encrypted?

You can hash any file. Only you know if you have encrypted the file or not. For the operating and for ransomware there is no difference: Any file is just a set of bytes. If you encrypt the file, compute hash, write file to the disk, then read it back, you will get exactly what you have written (your encrypted file). But you will not know if before saving to the disk it was encrypted by ransomware and after reading was decrypted.

that hash values, for a file, may vary according to the timing of encryption with a public key

  1. Not the hash of the plain file, but the encryption result may vary. As a consequence, for different files you will get different hashes.
  2. Encryption results of the same file may vary, but not because of timing. For instance, you can launch AES GCM encryption of the same file with the same password on 100 parallel threads on the same computer at the same time, all of them will produce different results. But when decrypted, they all will produce the same original file.
br flag
My prior question links to my procedure to compare hashes of files on the Windows source drive against hashes of files from an earlier backup, calculated on a Linux system. The comment that sent me here, responding to that prior question, seemed to say that this comparison of hashes would be pointless. You seem to be agreeing with my original approach. Prior question: https://security.stackexchange.com/questions/259716/hash-based-technique-to-detect-ransomware-corruption-on-the-fly?noredirect=1. Original post: https://raywoodcockslatest.wordpress.com/2021/12/08/ransomware-hash/
kr flag
@RayWoodcock: No, I don't agree with you. Sorry, it my answer was not clear. First you compute a hash in a clean system and save it somewhere externally. Then your system is infected with ransomware. It has encrypted some of your files. When you read them to compute a hash, ransomware will decrypt it and you will get the same hash as before. You will get the same hash as before. But cannot know if there is a "layer in between" that encrypts and decrypts data transparently. Thus, when you work in an infected system, by computing hashes you cannot detect if they are encrypted on the disk.
br flag
Thanks for the follow-up. Yes, you understand my question. The part that mystifies me is the case of user-encrypted files. Does the hashing tool "read them to compute a hash"? I think the answer is that the tool reads their external appearance: it gets random characters, and calculates a hash based on those characters. Why wouldn't (or maybe the question should be, why couldn't) the hashing tool likewise notice that a ransomware-encrypted file differs from what it used to be?
kr flag
1) *"Does the hashing tool "read them to compute a hash"?"* - Yes. Hash is not something that is automatically computed. Without reading the file there is no way to compute a hash. There is no **the** hash. There can be endless number of algorithms to compute a hash. You decide what kind of hash you want: MD5, SHA-256, SHA-512, BLAKE, etc.
kr flag
2) *"Why ... couldn't ... the hashing tool likewise notice ... ?"* - Because to notice a difference the tool needs to compute the hash. For this, the tool needs to read the file. The tool cannot access the storage directly. This can only be done indirectly, via OS. The OS is infected. The ransomware intercepts all requests to the storage. That's why the hashing tool obtains what ransomware provides. Ransomware decrypts the file after reading it from the storage and before giving it to the OS. Thus the hashing tool has no way to know what is really stored on HDD or SSD.
br flag
OK. Thank you for your patience. I think I understand. It seems (this type of) ransomware corrupts the OS (Windows, in my setup), rendering its file hashes suspect. But if files have been ransomware-encrypted, that should be evident in a comparison of the Windows hashes against hashes of the same files, calculated on a presumably clean Linux machine. If that's true, then the scheme outlined in the other post (link provided at the start of my question, above) seems legit.
kr flag
@RayWoodcock: I'm afraid you still misunderstand how OS and ransomware works. You mention powershell that detects file changes. But there are tons of tools that detect file changes on the fly. Windows provides *FileSystemWatcher* class that you can use to do that. Or you can just compare file sizes and timestamps. All these tools notice *real* changes of file contents. But non of these approaches notices encryption/decryption done by ransomware.
br flag
Regarding PowerShell: thanks, I do see that. The other post predates this one, as does my blog post. I'll be updating the latter. But is my last comment (above) incorrect? Scenario: a Windows hashing tool is fooled into reporting no change to a file. But a hashing tool on an uninfected Linux system, examining the same file, is not fooled: it assesses the file as changed by ransomware, and calculates a different hash. (One would hope for ransomware that is not cross-platform.) Then a comparison of Windows and Linux hashes detects an inconsistency. No?
kr flag
@RayWoodcock: *"Then a comparison of Windows and Linux hashes detects an inconsistency"* - Correct. But it is too complicated. It would be sufficient just to compare hashes in the external system (you name it Linux): Compare hashes of the current state with hashes of the previous state.
br flag
As long as I'm sure the Windows system is the infected one.
Score:2
ng flag

Hashing detects (with overwhelming probability) any difference between two pieces of data, including one being an encrypted version of the other. Thus the principle of comparing hashes of files to detect that many have changed is sound.

There are however a few ways a program systematically encrypting the files on disk could evade detection from a program checking that hashes of files on disk do not change. They include:

  • Disabling the check by seizing all CPU resources during the encryption.
  • Hooking into the read code of all programs (including the one performing the hash check) so as to present them unmodified data until all the files have been encrypted, even though the data has already been physically encrypted on disk. This is possible if the encryption key is used to decrypt until the encryption is complete.

Update: there is no need that the cryptoransomware be tailored to the hash, or to the hash comparison program; all it needs to do is correctly implement either of the above two bullet points. On the other hand, the few actual encrypting cryptoransomwares that I studied (in a VM) only partially implemented the first strategy (as a side effect of their main strategy: encrypt as fast as they can), and not the second one, which in modern OSes requires a privilege escalation.

br flag
This answer is clearer to me than the other. It seems to say that my original (blogged) idea was on the right track: hashing will detect a difference between an encrypted and unencrypted version of a file. I'm still puzzled that the comment on my other question (link above) was not only confident but also upvoted. Regarding your second bullet point: would the ransomware have to be written for each specific hashing tool, or could "hooking into the read code of all programs" be done in a general manner that would corrupt file reading activities by any program?
fgrieu avatar
ng flag
@RayWoodcock: I guess [this comment](https://security.stackexchange.com/questions/259716/#comment536553_259716) was upvoted because it points out that the question itself assumes: "the predominant form of ransomware encrypts a file ***and then decrypts it on the fly***, to make it available to the user, without alerting the user that the file could be permanently encrypted at any moment", akin to second bullet in my answer. If true (which is _not_ evident), that does defeat the hash technique, because it's hashed the deciphered data, which is identical to the original, thus has the same hash.
br flag
My reading suggested that this was true, about the predominant form of ransomware, but it's OK if I was wrong. Regardless, I don't think there's any dispute that this kind of ransomware does exist. I think your second bullet point and update are saying that the hashing tool doesn't see an encrypted file if the system has been told to overlook, at least for the moment, all file encryption of type X. An encrypted file is perhaps not an island unto itself; it is one prison cell in a bank of cells, all of which can be unlocked simultaneously with one electronic switch. But that's not "on-the-fly."
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.