Score:1

Crypto

Are these both the probability of collision in birthday attack?

Tim

7/18/23, 8:14 PM

About birthday attack, book Cryptography Engineering says:

In general, if an element can take on N different values, then you can expect the first collision after choosing about $\sqrt{N}$ random elements. We're leaving out the exact details here, but $\sqrt{N}$ is fairly close. For the birthday paradox, we have N = 365 and $\sqrt{N} \approx 19$. The number of people required before the chance of a duplicate birthday exceeds 50% is in fact 23, but $\sqrt{N}$ is close enough for our purposes and is the approximation that cryptographers often use.

One way of looking at this is that if you choose $k$ elements, then there are $k(k - 1)/2$ pairs of elements, each of which has a $1/N$ chance of being a pair of equal values. So the chance of finding a collision is close to $k(k - 1)/2N$. When $k = \sqrt{N}$, this chance is close to 50 % .

and wikipedia says:

As an example, consider the scenario in which a teacher with a class of 30 students (n = 30) asks for everybody's birthday (for simplicity, ignore leap years) to determine whether any two students have the same birthday (corresponding to a hash collision as described further). Intuitively, this chance may seem small. Counter-intuitively, the probability that at least one student has the same birthday as any other student on any day is around 70% (for n = 30), from the formula ${\displaystyle 1-{\frac {365!}{(365-n)!\cdot 365^{n}}}}$.

which can be rephrased in terms of the language in Cryptography Engineering:

$$1 - \frac{N!}{(N-k)! * N^k}$$

Is it supposed to equal to the following from Cryptography Engineering:

$$ (k(k-1))/(2N) $$

Why?

143

0 + 0

birthday-attack

collision-attack

kelalaka

7/18/23, 8:17 PM

That is about approximation that you did not notice on Wikipedia page. and possible dupes are [What are the odds of collisions for a hash function with 256-bit output?](https://crypto.stackexchange.com/q/39641/18298) and [Birthday Attack Probability of Collision in Introduction to Modern Cryptography](https://crypto.stackexchange.com/a/72791/18298) and [On a lower bound for the birthday problem](https://crypto.stackexchange.com/q/64584/18298) and [What is the error in this collision probability approximation?](https://crypto.stackexchange.com/q/66328/18298) and some other not listed...

kelalaka

7/18/23, 8:23 PM

Wiki page: [Birthday problem - Approximations](https://en.wikipedia.org/wiki/Birthday_problem#Approximations)

Tim

7/18/23, 8:59 PM

Thanks. @kelalaka. I don't find (k(k−1))/(2N) in the links. Do I miss it?

kelalaka

7/18/23, 9:01 PM

See `This is fine for low ` in the first link 101 answer...

Tim

7/18/23, 9:25 PM

@kelalaka Thanks. The book in my post seems to explain (k(k−1))/(2N) in a different way. Are the pairs each having two equal values disjoint events?

kelalaka

7/18/23, 9:40 PM

It first selects uniform random $k$ values then forms the pair where each has $1/N$ chance of collision....

fgrieu

7/18/23, 10:07 PM

A derivation of the approximation is in my [_Birthday problem for cryptographic hashing, 101_](https://crypto.stackexchange.com/a/39644/555). Beware that uses $(k,n)$ where the present question uses $(N,k)$.

Tim

7/19/23, 12:30 PM

@fgrieu Thanks. I still have the same question. The book in my post seems to explain (k(k−1))/(2N) in a different way from your answer. How shall I understand it from 1/N for each pair? Are the pairs each having two equal values disjoint events? Which part in its derivation is approximation?

Score:2

Crypto

fgrieu

7/19/23, 4:10 PM

The question asks how we go from $\displaystyle p=1 - \frac{N!}{(N-k)!\,N^k}$ for the probability of collision of $k$ uniformly random values among $N$, to the approximation $\displaystyle p\approx\frac{k(k-1)}{2N}$ (which assumes $k$ is small compared to $\sqrt N$ ).

First we get back to $\displaystyle1-p=\prod_{j=0}^{k-1}{\left(1-\frac j N\right)}$, which is how $p$ was determined in the first place. Then we take the logarithm on both sides and use that $u>0,v>0\implies\ln(u\,v)=\ln(u)+\ln(v)$ to get $$\displaystyle\ln(1-p)=\sum_{j=0}^{k-1}{\ln\left(1-\frac j N\right)}$$

For small $|x|$, it hold $\ln(1+x)\approx x$. Applying this to $x=p$ on the left side and $\displaystyle x=\frac j N$ on the right side, we get $\displaystyle p\approx\sum_{j=0}^{k-1}\frac j N$. We rewrite this as $\displaystyle p\approx\frac 1 N\sum_{j=0}^{k-1}j$.

Now we use that the sum of non-negative integers less than $k$ is $\displaystyle\frac{k\,(k-1)}2$ and get the desired $\displaystyle p\approx\frac{k(k-1)}{2N}$.

Without proof: this approximation of $p$ is always by excess. It's off by less than $+28\%$ when $k\le\sqrt N$, less than $+14\%$ when $k\le\sqrt{2N}$, less than $+7\%$ when $k\le2\sqrt N$.

Most of the error is from the approximation $\ln(1-p)\approx-p$. A much better approximation is: $$p\approx1-e^{-\frac{k(k-1)}{2N}}$$ which assumes only $k\ll N$ rather than $k\ll\sqrt N$. However beware that this alternate formula is numerically instable for small $p$.

In comment it's additionally asked

How shall I understand (this formula) from $1/N$ for each pair? Are the pairs each having two equal values disjoint events? Which part in its derivation is approximation?

One easy way to derive the probability $p$ that there is a collision among $k$ uniform values among $N$ (for $0\le k\le N$) is as the complement of the probability that there is not collision.

For a fixed $N$, define $q_k$ as the probability that there is no collision after $k$ values. Obviously $q_0=q_1=1$. And for $k\ge2$, $q_k$ is the probability that there was no collision among the first $k-1$ values (that is, $q_{k-1}$), time the probability that there is no collision between the $k-1$ previous values and the last drawn one, which is $\displaystyle\frac{N-k}N$ (justification there a exactly $N-k$ values among $N$ that do not leak to collision for the last value drawn).

It follows $\displaystyle q_k=q_{k-1}\left(1-\frac k N\right)$, thus $\displaystyle q_k=\prod_{j=0}^{k-1}{\left(1-\frac j N\right)}$, thus $$p=1-\prod_{j=0}^{k-1}{\left(1-\frac j N\right)}=1-\frac{N!}{(N-k)!\,N^k}$$

This is exact. See first two sections of this answer for the derivation of approximations.

One source justified the approximation as:

One way of looking at this is that if you choose $k$ elements, then there are $k(k−1)/2$ pairs of elements, each of which has a $1/N$ chance of being a pair of equal values.

This hand-waving argument does not yield a mathematically exact derivation of $\displaystyle p=\frac{k(k-1)}{2N}$, since the events are not disjoint. As long as $p$ is small, we can get away with it, but that gets grossly wrong when $k>\sqrt{2N}$.

When $k = \sqrt{N}$, this chance is close to 50%.

That's true if 39.3% is close to 50%.

0 + 0

Tim

7/19/23, 4:12 PM

Thanks. My comment at https://crypto.stackexchange.com/questions/99160/are-these-both-the-probability-of-collision-in-birthday-attack/99176#comment214408_99160 asked for different questions

Tim

7/19/23, 4:30 PM

If you take a look at the first quote in my post, the book doesn't derive the probability the way your added to your reply

Tim

7/19/23, 4:59 PM

"since the events are not independent." Additivity of probabilities of several events depends on the events being disjoint not independent

kelalaka

7/20/23, 11:46 AM

The percentages need links...

fgrieu

7/20/23, 12:56 PM

@kelalaka: my evidence for +28% is only numerical (hence the _without proof_): I plotted $\frac{\left(1-\frac{{k^2}!}{(k^2-k)!\,k^{2k}}\right)\,2k^2}{k(k-1)}-1$; same for +14% +7%.

Tim

7/20/23, 1:44 PM

Thanks. (1) Does your "First we get back to ... get the desired p≈k(k−1)/2N" justify (k(k−1))/(2N) as a reasonable approximation? (2) Does "This hand-waving argument does not yield a mathematically exact derivation of p=k(k−1)/2N, since the events are not disjoint. As long as p is small, we can get away with it, but that gets grossly wrong when k>sqrt(2N). When k=sqrt(N), this chance is close to 50%. That's true if 39.3% is close to 50%" mean that k(k−1)/2N is a bad approximation? (3) if you don't mean contradiction in (1) (2), could you explain, edit, or paraphrase what you actually mean?

fgrieu

7/20/23, 2:01 PM

(1) More precisely, it implies $k(k−1)/(2N)$ is a valid approximation _for $k$ or $p$ lower than some threshold._ (2) It means that $k(k−1)/(2N)$ is poorly justified by the argument given, comes without a statement of when it applies, and is a terrible approximation for $k$ above some thresold. As explained, it's already significantly off (by +24%) when applied to $k=\sqrt N$ and $N$ of cryptographic interest. (3) There's no contradiction.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Are these both the probability of collision in birthday attack?

TH: ทั้งสองอย่างนี้น่าจะเป็นการปะทะกันในการโจมตีวันเกิดหรือไม่?

RO: Acestea sunt ambele probabilitatea de coliziune în atacul de ziua de naștere?

RU: Являются ли они оба вероятностью столкновения в атаке дня рождения?

VI: Đây có phải là cả hai xác suất va chạm trong cuộc tấn công sinh nhật?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.