Score:1

Pedersen Hash : when truncating the hash to keep only the X coordinate, is it possible to compute a collision when the Babyjubjub curve is used?

in flag

The Pedersen hash is a low constraints friendly hash for Zk-Snarks.
Unlike many algorithms, the Pedersen hash returns a point P = (x,y) on a curve as a hash. Depending on the selected curve, there can exist a fast deterministic way to compute a different input that yields −P=(x,−y) using the Weierstrass form.

As a result, if software chooses to truncate a hash to its first half, and if the attacker controls the fixed length input, then there’s the possibility to compute 2 inputs that will yield the same truncated hash.

But can this situation happen if the Pedersen is implemented over the BabyJubJub curve ? Also, are there inputs that can yield special values on the curve such as −INF or 0 at least for x or does it behave like more conventional hashing algorithms ? Even in the Edwards form, can it still works defeat a === constraint (x is equal to −x) as circom and Groth16 is mainly about modulos ?
And if yes, how exactly this can be computed in practice in that case ?

The definition of the hash in the case of that specific curve is here.

user2284570 avatar
in flag
The implementation I’m talking about is [here](https://github.com/kobigurk/circomlib/blob/feat/audit_fixes/circuits/pedersen.circom), and the size of the attacker controlled message is fixed to 505bits.
rozbb avatar
br flag
That impl appears to be returning both coordinates. Would this attack be for a different, weakened system?
user2284570 avatar
in flag
@rozbb the software behind it takes only `out[0]` and discard `out[1]` which is `y`. But this could be a design choice since the chosen `JubJub` curve might ensure security even in that case.
Score:1
br flag

Update: I believe this does not degrade security

Summary: For reasonable choice of basis for the Pedersen hash $\mathsf{PH}$, it is the case that $$ \mathsf{xcoord}(\mathsf{PH}(\mathbf a)) = \mathsf{xcoord}(\mathsf{PH}(\mathbf a')) \implies \mathsf{PH}(\mathbf a) = \mathsf{PH}(\mathbf a'). $$ In other words, dropping the y-coordinate doesn't give you any advantage in finding collisions.

First, some math. What happens algebraically when you flip the y-coordinate of a Twisted Edwards curve? Recall the addition law for a curve of the form $ax^2 + y^2 = 1 + dx^2y^2$:

$$ (x_1, y_1) + (x_2, y_2) = \left( \frac{x_1y_2 + y_1x_2}{1 + dx_1x_2y_1y_2}, \frac{y_1y_2 - ax_1x_2}{1 - dx_1x_2y_1y_2} \right). $$

Let $g(H)$ map a curve point $H = (x,y)$ to its x-reflection $(x, -y)$. Observe that, for any $H$, $$ H + g(H) = (x, y) + (x, -y) = \left( \frac{xy - yx}{1 + dx^2y^2}, \frac{y^2 - ax^2}{1 - dx^2y^2} \right) = (0,-1) =: Q. $$ Thus, $g(H) = Q - H$ for all $H$. Also note, $Q$ has order 2, i.e., $Q + Q = \mathcal{O}$.

This is helpful for our characterization of collisions. Since the curve intersects any vertical line in at most two points, we have that for any two curve points $H \neq H'$, $$ \mathsf{xcoord}(H) = \mathsf{xcoord}(H') \iff H = g(H') \iff H = Q - H'. $$

What happens if we try to get an "easy" hash collision, i.e., getting a collision in the x-coordinate but not the y-coordinate? Let $\mathsf{PH}$ denote Pedersen hash function, mapping $k$ scalars $a_1, \ldots, a_k \in \mathbb F$ to $a_1 G_1 + \ldots + a_kG_k$, where $\{G_i\}$ denotes the Pedersen basis. An "easy" collision occurs with inputs $\mathbf{a}, \mathbf a'$ when $\mathsf{PH}(\mathbf a) \neq \mathsf{PH}(\mathbf a')$, and \begin{align*} &\mathsf{xcoord}(\mathsf{PH}(\mathbf a)) = \mathsf{xcoord}(\mathsf{PH}(\mathbf a')) \\&\iff \mathsf{PH}(\mathbf a) = Q - \mathsf{PH}(\mathbf a') \\&\iff {\textstyle\sum} a_iG_i = Q - {\textstyle\sum} a'_iG_i \\&\iff {\textstyle\sum} (a_i + a'_i)G_i = Q. \end{align*}

For certain choices of Pedersen basis $\{G_i\}$, this is a contradiction! If you select each $G_i$ from the large prime-order subgroup (a reasonable thing to do, since torsion is a headache in many protocols anyway), then you have that the LHS of the last equality is in the subgroup and the RHS is not. So if $(x,y)$ is an output of a $\mathsf{PH}$ with this kind of basis, then $(x, -y)$ cannot be an output of $\mathsf{PH}$.

Thus, the only way you can get an x-coordinate collision $\mathsf{xcoord}(\mathsf{PH}(\mathbf a)) = \mathsf{xcoord}(\mathsf{PH}(\mathbf a'))$ is to get a real collision $\mathsf{PH}(\mathbf a) = \mathsf{PH}(\mathbf a')$

Old Answer (ignore)

Outputting only the x-coordinate makes it trivial to find a collision.

Let $G_1, G_2, \ldots, G_k \in \mathbb G$ represent the Pedersen basis. The input to the Pedersen hash function $\mathsf H$ is $k$ scalars $a_1, \ldots, a_k \in \mathbb F$ and is mapped to the x-coordinate of $a_1G_1 + \ldots a_kG_k$.

Recall for any elliptic curve point $H = (x,y)$, it is the case that $-H = (x, -y)$. In other words, you get the negative by reflecting across the x-axis. The upshot is that for all $H$, $\mathsf{xcoord}(H) = \mathsf{xcoord}(-H)$.

Applying this to Pedersen, we get a collision: $$ \mathsf H(a_1, \ldots, a_k) = \mathsf{xcoord}(a_1G_1 + \ldots a_kG_k) = \mathsf{xcoord}(-(a_1G_1 + \ldots a_kG_k)) = \mathsf H(-a_1, \ldots, -a_k) $$

Daniel S avatar
ru flag
the Jujube curve is a curve in Edwards form where the inverse $(x,y)$ is $(-x,y)$.
user2284570 avatar
in flag
@DanielS so his answer is wrong?
Daniel S avatar
ru flag
@user2284570 I can't see a way to apply their argument to the Edwards curve case.
rozbb avatar
br flag
Oh I apologize I misread the question. I think this is secure. I will post my thoughts tonight
user2284570 avatar
in flag
@DanielS even in the Edwards form, can it still works defeat a `==` constraint as circom and Groth16 is mainly about modulos ? I m meaning, does circom takes care of the sign of integers or can an integer equals its negative equivalent in circom ?
rozbb avatar
br flag
Updated answer.
user2284570 avatar
in flag
@rozbb so getting `−x` from Input `M` remains possible ? If yes, what about the equality comparison `==` in Groth16 and thus circom considering how negative numbers are encoded with the chosen finite field ?
rozbb avatar
br flag
Yes, separately from what was addressed in the answer, if $H =(x,y) = \mathsf{PH}(\mathbf a)$ for some $\mathbf a$, then $-H =(-x,y) = \mathsf{PH}(-\mathbf a)$. This equality holds in the finite field your proof system uses, because Jubjub is explicitly chosen so that $x,y$ are BLS12-318 scalar field elements.
user2284570 avatar
in flag
@rozbb then, what about the equality comparison `===` in circom with Groth16 considering how negative numbers are encoded with the chosen finite field ? (Since it s about modulos)
rozbb avatar
br flag
The circom `===` operator tests equality over the finite field. The encoding doesn't matter here. It should behave as you expect.
user2284570 avatar
in flag
Thanks. I was thinking negative numbers where larger than `p`. But as I m not an expert on this, would it be possible to have sample code for the actual implementation for computing an input `M'` from `M` than turn `x` into `−x` anyway?
user2284570 avatar
in flag
@DanielS I m noting that in the implementation, the computations are done using BabyJubJub as a Montgomery curve instead of edward and later converted to the Edwards form. Does this change something?
Daniel S avatar
ru flag
@user2284570 That doesn't change anything as the same answer could have been produced by using Edwards form throughout.
user2284570 avatar
in flag
@DanielS I’m also noting the generated hash is the [BabyJubJub coordinate addition](https://github.com/kobigurk/circomlib/blob/4284dc1ef984a204db08864f5da530c97f9376ef/circuits/pedersen.circom#L229) of Pedersen Hash of [several unpacked 4 bits Pedersen hash](https://github.com/kobigurk/circomlib/blob/4284dc1ef984a204db08864f5da530c97f9376ef/circuits/pedersen.circom#L206) (x and y points) . Does this even mean getting `−x` from modifying `M` is impossible ?
Daniel S avatar
ru flag
@user2284570 No, although it is not obvious, if you work through the details of the bit-unpacking where we have a message $M$ such that $e(M)=c$ where $e$ is the unpacking function $b_0b_1b_2b_3=(2b_3-1)(1+2b_0+4b_1+8b_2)$ applied nybble-wise, then flipping all of the $b_3$ bits gives a message $M'$ such that $e(M')=-c$.
user2284570 avatar
in flag
@DanielS thanks. [There’s no bit packing](https://github.com/kobigurk/circomlib/blob/4284dc1ef984a204db08864f5da530c97f9376ef/circuits/pedersen.circom#L237). Sorry to maybe write stupid garbage questions as I’m not an expert, but does [a BabyJubJub addition affect both coordinates](https://github.com/kobigurk/circomlib/blob/4284dc1ef984a204db08864f5da530c97f9376ef/circuits/babyjub.circom#L23) ? I’m meaning that the value of `x1` or `x2` along `y1` and `y2` affect the value of `yout` and that the value of `y1` and `y2` affect `xout` ?
user2284570 avatar
in flag
@DanielS JubJub isn t Edwards but https://en.wikipedia.org/wiki/Twisted_Edwards_curve. Does this change something?
rozbb avatar
br flag
No, the above answer applies to any Twisted Edwards curve
user2284570 avatar
in flag
@rozbb As I notice a seed… and more generally : https://crypto.stackexchange.com/q/107320/59586 ?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.