Score:5

Does file size significantly affect brute-force time?

es flag

Suppose you have two files encrypted with AES-256. One of the two files is 5MB, the other one is over 1GB. Their passwords are reasonably strong: >12 characters, letters, numbers, upper and lower case characters. If you tried to brute-force them, would it take a shorter time to try the same number of passwords on the smaller file than on the larger one?

For example (just making up numbers here), could it take 1000 years to try 100.000.000.000 passwords on the small one, but 10.000 years to try the same number of passwords on the larger one?

This isn't a real-life scenario; it's just for a story I'm writing where cryptography plays only a marginal role. Accuracy isn't necessary so long as the idea that brute-forcing a smaller file takes a shorter time than brute-forcing a larger one makes sense in principle, and the difference becomes non-negligible in the long run.

EDIT: I should probably have mentioned that neither file is plain text. They are email databases of some unspecified, proprietary format, which means if you were to try to open them with a text editor, you'd get gibberish. So, basically I expect that, in order to see if a password was right, you'd have to decipher the whole thing and plug it into the email software to see if it is read correctly.

Aman Grewal avatar
gb flag
If you need to decrypt the entire file to see if the key is correct, yes. But if you know something about the plaintext (e.g., it's a txt file written in English), you might only need to check a few bytes to know if the key is invalid.
fgrieu avatar
ng flag
_"email databases of some unspecified, proprietary format"_ encrypted with AES-256 (in a standard mode) and a password-generated key are vulnerable to password search with effort that does _not_ grow with the file size, for competent adversaries. In short, that's because database data in distinguishable from random (even without knowledge of the database format). See details in the second section of [my answer](https://crypto.stackexchange.com/a/102706/555).
kelalaka avatar
in flag
Your concern is not brute-forcing the key of AES-256, you should be concerned about the passwords!
Score:8
ru flag

AES is a block cipher and typically operates on large plaintext data by dividing it into blocks of fixed size (128-bits in the case of AES) and processing these blocks individually. Similarly decryption is likely to progress working on individual blocks of ciphertext and it is (intentionally) easy to recover part of the plaintext without having to decrypt everything. If the plaintext is easy to recognise (e.g. you know that it starts "Dear Professor Moriarty" or "CLASSIFICATION: TOP SECRET", or consists only of printable ASCII characters), then if there are $2^{256}$ possible keys the correct one can be determined just by decrypting three or so blocks of known start (about 50 characters worth) or sixteen or so blocks if we only know that the text is all printable ASCII.

In the case of your tale, there would be many fewer blocks to check as there are many fewer possible keys. It is likely that a single block of text would be sufficient to identify the key.

In other words, the file size would have no bearing on the time to exhaust 100,000,000,000 keys as the decryption process could be applied just to the first block of ciphertext rather than the full file.

Nicola avatar
es flag
None of the files in my story is plaintext. These two files are email databases of some unspecified format, so the kind of stuff that looks like gibberish if you try to open it with a text editor. Does this make a difference?
Daniel S avatar
ru flag
Even stuff that looks like gibberish to the human eye can often be distinguished from random by a computer. Even compressed files will have headers at the start that a computer can use to check the correct answer. It would be very unusual to have a file where you need to have the whole file to understand any part of it.
Score:5
ng flag

As stated in these other answers, decryption time is essentially independent of file size with standard encryption methods.

However the decryption time can be proportional to file size with some (non-standard) form of dual encryption such that fully decrypting the outer layer is necessary to start the decryption of the inner layer.

Such dual encryption could be as follows:

  1. Draw a random secret 256-bit key $K$.
  2. Encrypt the file with key $K$ per AES-256-CTR, yielding intermediary ciphertext $I$.
  3. Hash $I$ with SHA-256, yielding $H$.
  4. Compute salt $S\gets K\oplus H$.
  5. Stretch the password into a 256-bit derived key $D$ with salt $S$ per PBKDF2-HMAC-SHA-256 and 10000 rounds (a common if not recommendable practice).
  6. Encrypt $I$ with key $D$ per AES-256-CTR, yielding ciphertext $C$.
  7. Form the encrypted file as $S$ followed by $C$.

Standard password-based encryption would typically consist of generating a random salt $S$, then performing steps 5/6/7; so we are merely extending practice. A competent password cracker or state-level actor would be able to attack that if the password is only fair, using GPUs, FPGAs or ASICs to speed up the password search, because it's used PBKDF2 (using Argon2 instead would be state-of-the-art, and make password search considerably more constly)

The best way to find the password and decipher goes:

  • Parse the encrypted file to get prefix $S$ and the rest $C$.
  • For each password to test (mostly: from most to least probable):
    • Perform step 5, yielding some $D$.
    • Decrypt $C$ with key $D$ per AES-256-CTR, yielding some $I$.
    • Hash $I$ with SHA-256, yielding some $H$ (as in step 3).
    • Compute some $K\gets S\oplus H$ (reversing step 4).
    • Decrypt the beginning of $I$ yielding the beginning of the file if the password is correct, or meaningless garbage otherwise.
    • Test if that beginning of the file is plausible (that's possible quickly and with near certainty for most practical files: Anything uncompressed, compressed video, zip archive, JPEG image); if so, decipher the rest.

The critical point is that it's necessary to decrypt the whole of $C$ into the whole of $I$, and hash that, in order to test if the correct password was found. When the file is large, that will dominate the cost of password search, to the point of making the key stretching of step 5 a negligible part of the deciphering cost.


An edit to the question adds:

Neither file is plain text. They are email databases of some unspecified, proprietary format, which means if you were to try to open them with a text editor, you'd get gibberish. So, basically I expect that, in order to see if a password was right, you'd have to decipher the whole thing and plug it into the email software to see if it is read correctly.

No! For the overwhelming majority of file formats and standard encryption methods, including databases of proprietary format and AES-256 in all standard modes, it's easy to distinguish if a partial decryption was made with the correct key or not. It's used that the data in the partial decrypt is distinguishable from random for the correct key only. As a proof of concept, the standard ent tool will reliably make the differences between 4kiB bytes of /dev/urandom, and a 4kiB segment in a database.

In order to get the decryption time by competent actors proportional to the file size, we need a special kind of encryption as I describe, involving dual encryption, and some twist as in my steps 3/4.

Score:3
cn flag

There is no realistic scenario where the size of the file has any influence on how hard it is to guess the password.

Password-based file encryption works this way:

  1. Use a password-based key derivation function to derive an encryption key from the password. This step depends only on the key derivation parameters, not on the encrypted file.
  2. Use the encryption key to encrypt or decrypt the file. This step only depends on the key and on the file, not on the password.

Depending on the details of the format, if you try the wrong password at step 1, you may get an error or a key that's wrong but with no way to tell that it's wrong. In the latter case, you can start step 2 to decrypt just the beginning of the file, and see if it decrypts to something sensible. There is no plausible reason why you'd need to decrypt more than a few bytes in step 2, unless the person who chose the file format deliberately went out of their way to make it difficult, for example with some form of nested encryption. If they did, it would indicate that they are particularly security conscious and would have chosen a strong password and strong key derivation parameters, so there's little chance that the attacker would find the correct password. With such a security conscious file owner, the only chance the attacker has is if they have partial information of the password, for example because they saw it typed but the keyboard was partially obscured so they don't know all the characters. It wouldn't be brute force, and such a security conscious person is not going to use their girlfriend's name or anything like that as a password, they're going to use a randomly generated password. And to reiterate, having to decrypt the whole file in step 2 is a highly contrived scenario to start with, which would be hard to justify in a way that both satisfies knowledgeable audiences and doesn't completely bore non-knowledgeable audiences.

If there's a difference between how hard it is to crack the password-based encryption, it comes down to the parameters of step 1. There are two parameters that affect how long step 1 takes for the attacker. The obvious one is the password: one file may have a password that's harder to guess than the other. The other parameter is the “cost” or “strengthening” or “iterations” factor (the generic name is not really settled yet) for the password-based key derivation. A password-based key derivation is designed to be deliberately slow, to slow down brute force attempts. But it's only as slow as the legitimate user is willing to wait. So the algorithm is configurable, with a cost factor representing this compromise between security and usability. The derivation may have been configured with a different cost factor for the two files. This is again completely independent of the file length.

Score:0
US flag
user104975

Unless you are using an encryption cipher that relies on that (that would be horridly insecure for many reasons) then no.

The key space is what is the most important. Adding thousands of iterations will slow encryption time and make the key space more resource-intensive to attack but that is where the relation ends.

The oversimplified modern cipher process is (minus the whole XOR'ring and the multitude of mathematical functions that underlie them)

  1. Generate a key via a key generating algorithm. Modern encryption software uses the inbuilt PRNG API on the OS you are using while others like Veracrypt ask the user to mouse their cursor around to generate entropy+ the PRNG API. As an example, the original NIST paper for AES does not specify any algorithm as "superior," it only asks for one with good enough entropy
  2. Use the key to encrypt/decrypt data. This is where most of meat of the math that goes behind AES and other modern ciphers such as Serpent and Camellia happen.

Nowhere in the cycle is it strictly related to filesize.

You may argue that block sizes must have some sort effect. They are not the focus of academic attacks on modern ciphers neither are they important. A cipher with its blocks as its main point of attack is also horridly insecure. That would have been significant with the 8 byte blocks in DES which were vulnerable to the birthday attack. That made it impossible to guarantee the safety of ciphertext larger than 32 GB that is not affected by the birthday attack. Fortunately, the NIST AES competition blessed us with a bigger limit of 256 exabytes. It is important to note the birthday attack is no less faster than brute force.

No, filesize is independent (in good ciphers.)

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.