Score:1

Crack AES encryption via passphrase dictionary attack?

fr flag

How easy would it be to crack a AES-256 encrypted file, that is protected by a passphrase?

I understand that the trying to brute force a AES-256 encryption key would be on the unfeasible side, even with quantum computing. But what if that encryption key was instead generated from a passphrase? How easy would it then be to break the encryption?

I'm not experienced at all in cryptography, but tried making some simple estimations: According to the Oxford dictionary there are around 170 thousand english words in use. Of course, only a fraction of these are frequently used passwords, but let's use this number.

If we tried to brute force the passphrase, I suppose we'd start with 1-word long passphrases, followed by 2-word long and so on. Assuming the words are not repeated, this would give a total number of possibilities :

$N(L) = \sum_{i=1}^L \frac{170000!}{i!}$

Where L is the maximum passphrase length (number of words) we want to try to crack.

In order to get a number of permutations of around $2^{256}$, then we'd need a passphrase of at least 18 words, given that $N(15) = 2.85\times10^{78} = 2^{260}$.

Is it then correct to assume that, if the attacked knows the "password" is indeed a passphrase made up of english keys, then the passphrase needs to be at least 15-word long in order for it not to be the weaker link in the AES encryption scheme?

kelalaka avatar
in flag
Why do you need such a calculation? Why don't you use Dicewire or Bip39 to secure your key with a good key derivation algorithm like Argon2 or Scrypt?
Score:2
cn flag

Assuming the words are not repeated, this would give a total number of possibilities :
$N(L) = \sum_{i=1}^L {170000 \choose i}$

No: this is the total number of sets of $L$ words, but the words have to be in order, so the value is actually $N(L) = \sum_{i=1}^L \frac{170000!}{i!}$.

Furthermore there is no reason not to repeat words: this just makes the passphrase slightly easier to guess. So the number of $L$-word passphrases is actually $170000^{L}$. The number of passphrases of $1$ to $L$ words is $$N(L) = \sum_{i=1}^L 170000^i$$

The attacker is actually fairly likely to know the number of words in the passphrase, but this doesn't change the number by much.

Doing the calculation, $N(14) < 2^{256} < N(15)$.

Is it then correct to assume that, if the attacked knows the "password" is indeed a passphrase made up of english keys, then the passphrase needs to be at least 18 15-word long in order for it not to be the weaker link in the AES encryption scheme?

Still no, because the cost of testing a passphrase is higher than the cost of testing a key. To test a passphrase, the adversary has to first derive the key from the passphrase, and then test the key. Deriving a key from a passphrase is deliberately slow: it uses a key stretching function.

Just how much slower key stretching is compared to a hash calculation depends on the choice of key strecthing algorithm, on how it's parametrized and on what hardware the attacker has. For brute force on this scale, the cost of hardware design is negligible, and the cost is dominated by power consumption. For a legacy iterated-operation key stretching function such as PBKDF2, the amount of silicon to power for the key stretching is not significantly higher than for AES. It's typical to pick a slowness factor such that one run costs a few tenths of seconds, compared to a few billionths of a second for the AES part, meaning a ratio of about $2^{26}$. With a modern key stretching function that's also memory-hard, the ratio is higher since you also have to power the RAM. I'm going to use $2^{30}$ as the ratio.

This means that in order for AES to be weaker against brute force than the passphrase, we need $N(L) \ge 2^{256}/2^{30} = 2^{226}$, which is achieved for $L \ge 13$.


But… this number is not meaningful! There is absolutely no need for passphrase cracking to be slower than AES cracking, because AES cracking is already way beyond impossible. If passphrase cracking is impossible, but “less impossible” than AES, it's still impossible.

The Bitcoin network uses about 0.4% of the world's total electricity production (source: ≈ 100 TWh/year out of somewhat over 25000 TWh/year) and calculates $2^{93}$ hashes/year. Assuming you'd get the same number of elementary operations per Wh for passphrase cracking, with the cost factor difference of $2^{30}$ I estimated above, this means that an upper bound for passphrase cracking is $2^{63}$ per year.

So if you want your key to be secure from an NSA-level adversary for a thousand years, you need $N(L) \ge 1000 \cdot 2^{63} \approx 2^{73}$, which is achieved for $L \ge 5$.

At this level of strength against brute force, brute force just isn't a concern. Or rather, “brute force” as in supercomputers isn't a concern. The brute force that is a concern is one applied with a blunt instrument.

A Crypto nerd's imagination:
[Cueball is holding a laptop, and his friend is examining it.]
Cueball: His laptop's encrypted. Let's build a million-dollar cluster to crack it.
Friend: No good! It's 4096-bit RSA!
Cueball: Blast! Our evil plan is foiled!
What would actually happen:
[Cueball is holding a piece of paper and giving his friend a wrench.]
Cueball: His laptop's encrypted. Drug him and hit him with this \$5 wrench until he tells us the password.
Friend : Got it.

Actual actual actual reality: a really motivated attacker will find the passphrase through phishing or, for really careful users and powerful attackers, by planting a camera or planting malware. (Or a combination thereof.)

fr flag
Thanks, corrected my calculation. I understand that right now, in practice, one does not have to go all to 256 bits of encryption, or 2^256 permutations. That was what I was aiming for, however, since with quantum computing on the not distant future, the brute-force worst case would take approx. $O(\sqrt{N}) steps using Grover search. I suppose that in order to make it quantum resistant against a not-necessarily NSA-level but still motivated adversary (depending how much quantum computers will cost through the cloud), we would need around 12 words to match the level of security you mentioned?
Score:1
in flag

As Gilles notes, If you have a dictionary of 170,000 words (which is on the high side) the number of passphrases with $L$ words is : $170000^L$

When converting from a passphrase to a key we use a key derivation function designed to be deliberately slow to prevent brute force attacks, how much slower depends on the details, there are multiple algorithms around and all the modern ones are parameterized to be configureably slow.

But to do some back of the envelop math, assume using GPU, you can probably get up to 1000 checks per second with 1$/hour cloud resources. This means you get 3.6M passphrases checked / USD (This is when using slightly dated, yet common password derivations, e.g PBKDF2 with moderate iteration count and the attacker uses the best off the shelf cloud resources available).

Now in order to understand the security you can convert passphrase length to cost.

L=1, 170,000 possible phrases, cost is sub 1$

L=2, 29B possible phrases, cost is ~ $8000

L=3 , cost becomes $1.3B which is already not possible using public cloud, and probably not possible even for a nation state with extra dedicated hardware.

fr flag
Thank you, also very nice to see a cost estimated to help put in perspective! I was just wondering how much this would be with quantum computing not to distant in the future, and readily accessible via the cloud too.
Meir Maor avatar
in flag
There is no relevant quantom computing in the forseable future. We are several breakthroughs away.
eckes avatar
us flag
It should however be mentioned that top-100millon password lists are still in the sub 1000$ range and they cover a lot of weak passwords. So if your software does not use strong key derivation (high iteration) it’s endangering common users for easy offline attacks. If only a simple hash is used it’s even worse. You can easily brutefore up to 8 chars on a local machine.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.