Score:3

How to calculate probability of cracking a password from entropy?

pr flag

I am working on a project for my maths assessment where I research the effect of complexity and length on a given password. Currently, I am working on calculating the probability of guessing a password on the first try. I assumed that I had to start from entropy and go from there but I am kind of stuck on which formula to use in order to find the probability.

I considered 1 / (2^entropy) but I am not sure if this is the correct solution since I couldn't find much on the internet.

I would appreciate any answers.

hm flag
Password entropy can only be precisely calculated when the passwords are randomly generated from a known set of source characters or strings. Real-world (human-generated) password 'entropy' is more about human psychology than it is about math, so it's not a solved problem by any means.
Score:3
kr flag

The answer depends on many factors. In particular:

  • If a human has generated it, the probability of some value will be higher that the probability of the others.
  • If random generator was used, it depends on what is known about its properties. If it is known that it generates some values more often that the others, this can affect the probability to guess.
  • It depends on correctness of implementation of password generator. Implementation bugs can affect the probability of some values.
  • It depends on how guessing is organized. If human is guessing, then even knowing the information above, it will have one probability. If some application is guessing, it will have another probability.

The formula 1/2entropy is correct. But, as Royce Williams said, in many cases it is hard to calculate the entropy. Entropy is not something absolute. It depends on the context, on the factors mentioned above. For instance, password is generated randomly using lower case English letters. One person knows this exactly. Another person knows only that password can contain English letters in both cases and digits. The probability to guess for the first person will be higher.

Thus the actual question is: How to calculate entropy?

Score:2
ng flag

As others have mentioned, for entropy to be well-defined, you need to have some underlying probability distribution $D$. If you are willing to look around some unsavory places, you could look for a moderately-large data breach to get an empirical distribution of passwords that you can compute the entropy of. Alternatively there might be some user studies that have computed things like this.

The main point of this answer is to mention that there are multiple inequivalent notions of entropy, and that the traditional (shannon) entropy is not always the best one in cryptography. Shannon entropy is defined as

$$H(X) = \sum_{x\in \mathsf{supp}(X)} p(x)\log(1/p(x))$$

Another fairly-common notion of entropy is the min-entropy, defined as

$$H_\infty(X) = \max_{x\in \mathsf{supp}(X)} \log(1/p(x))$$

This roughly captures the most likely output under $X$. It is often much better to use for cryptographic purposes, which can be demonstrated via a simple example.

Let $X$ be a random variable that is

  1. $0$ with probability $1/2$, and
  2. uniform over $\{1,\dots,2^k\}$ with probability $1/2$.

The min entropy of this is very small (it is $1$). The shannon entropy of it is much larger (I believe it is something like $k-O(1)$, I am too lazy to compute the constant). So if you are measuring the quality of a distribution over passwords via

  • min entropy, you will think that $X$ is quite bad, vs
  • shannon entropy, you will think $X$ is quite good.

Of course, half of all users sampling passwords from $X$ can be trivially attacked, so it should perhaps be considered "bad" (in accordance to what min entropy would predict).

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.