Score:1

Pedersen Hash : when truncating the hash to keep only the X coordinate, is it possible to compute a collision when the Babyjubjub curve is used?

in flag

The Pedersen hash is a low constraints friendly hash for Zk-Snarks.
Unlike many algorithms, the Pedersen hash returns a point P = (x,y) on a curve as a hash. Depending on the selected curve, there can exist a fast deterministic way to compute a different input that yields −P=(x,−y) using the Weierstrass form.

As a result, if software chooses to truncate a hash to its first half, and if the attacker controls the fixed length input, then there’s the possibility to compute 2 inputs that will yield the same truncated hash.

But can this situation happen if the Pedersen is implemented over the BabyJubJub curve ? Also, are there inputs that can yield special values on the curve such as −INF or 0 at least for x or does it behave like more conventional hashing algorithms ? Even in the Edwards form, can it still works defeat a === constraint (x is equal to −x) as circom and Groth16 is mainly about modulos ?
And if yes, how exactly this can be computed in practice in that case ?

The definition of the hash in the case of that specific curve is here.

user2284570 avatar
in flag
The implementation I’m talking about is [here](https://github.com/kobigurk/circomlib/blob/feat/audit_fixes/circuits/pedersen.circom), and the size of the attacker controlled message is fixed to 505bits.
rozbb avatar
br flag
That impl appears to be returning both coordinates. Would this attack be for a different, weakened system?
user2284570 avatar
in flag
@rozbb the software behind it takes only `out[0]` and discard `out[1]` which is `y`. But this could be a design choice since the chosen `JubJub` curve might ensure security even in that case.
Score:1
br flag

Update: I believe this does not degrade security

Summary: For reasonable choice of basis for the Pedersen hash $\mathsf{PH}$, it is the case that $$ \mathsf{xcoord}(\mathsf{PH}(\mathbf a)) = \mathsf{xcoord}(\mathsf{PH}(\mathbf a')) \implies \mathsf{PH}(\mathbf a) = \mathsf{PH}(\mathbf a'). $$ In other words, dropping the y-coordinate doesn't give you any advantage in finding collisions.

First, some math. What happens algebraically when you flip the y-coordinate of a Twisted Edwards curve? Recall the addition law for a curve of the form $ax^2 + y^2 = 1 + dx^2y^2$:

$$ (x_1, y_1) + (x_2, y_2) = \left( \frac{x_1y_2 + y_1x_2}{1 + dx_1x_2y_1y_2}, \frac{y_1y_2 - ax_1x_2}{1 - dx_1x_2y_1y_2} \right). $$

Let $g(H)$ map a curve point $H = (x,y)$ to its x-reflection $(x, -y)$. Observe that, for any $H$, $$ H + g(H) = (x, y) + (x, -y) = \left( \frac{xy - yx}{1 + dx^2y^2}, \frac{y^2 - ax^2}{1 - dx^2y^2} \right) = (0,-1) =: Q. $$ Thus, $g(H) = Q - H$ for all $H$. Also note, $Q$ has order 2, i.e., $Q + Q = \mathcal{O}$.

This is helpful for our characterization of collisions. Since the curve intersects any vertical line in at most two points, we have that for any two curve points $H \neq H'$, $$ \mathsf{xcoord}(H) = \mathsf{xcoord}(H') \iff H = g(H') \iff H = Q - H'. $$

What happens if we try to get an "easy" hash collision, i.e., getting a collision in the x-coordinate but not the y-coordinate? Let $\mathsf{PH}$ denote Pedersen hash function, mapping $k$ scalars $a_1, \ldots, a_k \in \mathbb F$ to $a_1 G_1 + \ldots + a_kG_k$, where $\{G_i\}$ denotes the Pedersen basis. An "easy" collision occurs with inputs $\mathbf{a}, \mathbf a'$ when $\mathsf{PH}(\mathbf a) \neq \mathsf{PH}(\mathbf a')$, and \begin{align*} &\mathsf{xcoord}(\mathsf{PH}(\mathbf a)) = \mathsf{xcoord}(\mathsf{PH}(\mathbf a')) \\&\iff \mathsf{PH}(\mathbf a) = Q - \mathsf{PH}(\mathbf a') \\&\iff {\textstyle\sum} a_iG_i = Q - {\textstyle\sum} a'_iG_i \\&\iff {\textstyle\sum} (a_i + a'_i)G_i = Q. \end{align*}

For certain choices of Pedersen basis $\{G_i\}$, this is a contradiction! If you select each $G_i$ from the large prime-order subgroup (a reasonable thing to do, since torsion is a headache in many protocols anyway), then you have that the LHS of the last equality is in the subgroup and the RHS is not. So if $(x,y)$ is an output of a $\mathsf{PH}$ with this kind of basis, then $(x, -y)$ cannot be an output of $\mathsf{PH}$.

Thus, the only way you can get an x-coordinate collision $\mathsf{xcoord}(\mathsf{PH}(\mathbf a)) = \mathsf{xcoord}(\mathsf{PH}(\mathbf a'))$ is to get a real collision $\mathsf{PH}(\mathbf a) = \mathsf{PH}(\mathbf a')$

Old Answer (ignore)

Outputting only the x-coordinate makes it trivial to find a collision.

Let $G_1, G_2, \ldots, G_k \in \mathbb G$ represent the Pedersen basis. The input to the Pedersen hash function $\mathsf H$ is $k$ scalars $a_1, \ldots, a_k \in \mathbb F$ and is mapped to the x-coordinate of $a_1G_1 + \ldots a_kG_k$.

Recall for any elliptic curve point $H = (x,y)$, it is the case that $-H = (x, -y)$. In other words, you get the negative by reflecting across the x-axis. The upshot is that for all $H$, $\mathsf{xcoord}(H) = \mathsf{xcoord}(-H)$.

Applying this to Pedersen, we get a collision: $$ \mathsf H(a_1, \ldots, a_k) = \mathsf{xcoord}(a_1G_1 + \ldots a_kG_k) = \mathsf{xcoord}(-(a_1G_1 + \ldots a_kG_k)) = \mathsf H(-a_1, \ldots, -a_k) $$

Daniel S avatar
ru flag
the Jujube curve is a curve in Edwards form where the inverse $(x,y)$ is $(-x,y)$.
user2284570 avatar
in flag
@DanielS so his answer is wrong?
Daniel S avatar
ru flag
@user2284570 I can't see a way to apply their argument to the Edwards curve case.
rozbb avatar
br flag
Oh I apologize I misread the question. I think this is secure. I will post my thoughts tonight
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.