Score:4

Is the sum of hashes a suitable hash for sets?

ma flag

Let $H: X \rightarrow \{0, 1\}^b$ denote a cryptographically secure, $b$-bits hash function on a set $X$. Let $H^*: \mathbb{P}(X) \rightarrow \{0, 1\}^b$ be a function on the power set of $X$ defined by \begin{equation} H^*(\{x_1, \ldots, x_n\}) = \sum_i H(x_i) \end{equation} where the sum is intended as wrapping addition over $b$-bit integers.

I am wondering if $H^*$ is cryptographically secure on $\mathbb{P}(X)$.

I easily see that other aggregation mechanisms (such as XOR-ing all hashes together) are easily prone to collisions. I also see how, if the same element of $X$ could appear multiple times in the collection being hashed, one could build a collision by simple integer division. But, if all the elements of $X$ being hashed are distinct, I can’t easily see an attack on this construction.

fgrieu avatar
ng flag
How does the sum of hashes exactly works? My reading of the question is it's modulo $2^b$, but an answer mentions modulo $2^b-1$.
Score:1
ng flag

Contrary to what other people say, this is actually (when modified slightly) fine. You seem interested in the problem of homomorphic hashing. Note that while fully homomorphic encryption refers to being (ring)-homomorphic, i.e. with respect to something like $(\mathbb{F}_2, +, \times)$, homomorphic hashing refers to being homomorphic with respect to the monoid $(\mathbb{P}(X), \cup)$, i.e. precisely your notion of homomorphism.

Anyway, constructions of homomorphic hashes have been known for quite a while. See this for a semi-recent discussion of things. Something almost exactly like your proposal actually is secure (it is called "LtHash", and dates back to the 90s). The "trick" here is that one hashes onto $\mathbb{Z}_q^n$ for $n$ large. One can then reduce the collision resistance of the hash to a standard lattice problem (the Short Integer Solution problem), which (when appropriately parameterized) is thought to be hard. Note that the setting of $n = 1$ (your actual case) is likely insecure though, at least without setting $q$ to be quite large.

Score:0
sa flag

Edit: I misinterpreted the algebraic domain, but the answer can be adapted, see below.

The complexities exhibited are exponential, but reduced, and in applications such as hash functions, LWE, LPN where similar structures are used, these reduced complexities are of interest.

$H^\star$ is definitely insecure. Like most attacks on hash functions you will generate random inputs looking for collisions. Here let's consider addition modulo $2^b-1,$ as you suggest.

If $n$ is small compared to $2^b,$ there are still shortcuts which reduce the complexity substantially. Here is an attack finding a collision on such a function. Given some desired hash output $H_0,$ assume we have a random set of $v=2^{b/n}$ hashes, of the form $$H^\star(\{x_{1,i},\ldots,x_{n,i}\}),\quad i=1,\ldots,v\quad(1).$$ Since here are $v^n=2^b$ possible $n-$sums of the form (1) these sums we have assembled will will hit your $H_0$ with constant probability since the hash target space has size $2^{b}.$

Explanation: We have a finite domain, and balls in bins process with $2^b$ possible inputs.

If $n=2$ this would essentially be the birthday problem with complexity $O(2^{b/2}).$ By reduction to the case when $n=2^s$ is a power of 2, Wagner gave an $$O(2^{b/(1+s)}=O(2^{b/(1+\lceil \log n\rceil)})$$ recursive solution for the case of XOR but this can be adapted to any finite group, including addition modulo $2^b-1.$ See also chapter 8 of the book Algorithmic Cryptanalysis by Antoine Joux.

poncho avatar
my flag
You state that $H^*$ is definitely insecure, however the solutions you give take time exponential in $b$
ma flag
Forgive me, I’m having trouble navigating this answer. If I understand correctly, you’re proposing to generate a v x n matrix of random elements, then observe that if you pick one element from each “column” you have enough combinations to fill the hash space? Or am I mistaken? But.. how do you select the correct combination?
ma flag
Also I’m having a bit of trouble with the value of v. First you propose to set it to 2^(b/n), then you mention n =2^v which would imply v = log2(n). What am I missing?
ma flag
Finally, the result that you linked mentions XORs, but I’m probably too inexperienced to see the relationship that result bears with my question, which concerns sums modulo 2^b. Sorry I am so easily confused!
kodlu avatar
sa flag
you brute force to find the correct combination, like all collision attacks.
ma flag
Of course, of course, but it is not immediate to me how the sum step makes this faster than a simple birthday attack. Let’s assume for simplicity H0 = 0. Every time I compute H* I either find something that complements a previous output I found, or I save it and carry on. This in my mind takes around 2^(b/2) steps. I understand that there is a faster solution, but I am not getting how it works.
ma flag
I’ll try to read through the link you sent and wrap my head around it!
fgrieu avatar
ng flag
Gaussian elimination won't works in this case, because it finds a solution which coefficients are in the ring of the elements of the matrix, but the problem statement restricts coefficients of the solution to 0 and 1.
Score:0
ng flag

As a illustration that $H^*$ is sizably less secure than a $b$-bit hash, I give a collision attack with $\mathcal O(2^{b/3})$ hashes and $\mathcal O(b\,2^{b/3})$ memory. For simplicity, assume $b$ is a multiple of $3$, and the addition in the computation of $H^*$ is carried modulo $2^b$.

  • Choose an easily computable function $X(i)$ mapping integers $i$ to $x_i\in X$; we'll note $H_i$ for $H(X(i))$.
  • Set a table $T$ to empty, that will holds pairs $(i,H_i)$ indexed for search according to key $H_i\bmod2^{b/3}$.
  • Set a table $U$ to empty, that will holds triplets $(i,j,(H_i+H_j\bmod2^b)/2^{b/3})$ such that $H_i+H_j\bmod2^{b/3}=0$, indexed to detect triplets with the same third element.
  • For increasing $i$
    • compute $H_i$
    • search for an $H_j$ already in table $T$ with key $(-H_i)\bmod2^{b/3}$
    • If none is found, enter $(i,H_i)$ in table $T$ (to conserve memory at the expense of a little more hashes: except if the number of entries with key $H_i\bmod2^{b/3}$ is already some small threshold, e.g. 3)
    • Otherwise, remove $(j,H_j)$ from table $T$ and enter $(i,j,(H_i+H_j\bmod2^b)/2^{b/3})$ in table $U$, while testing for collision with an entry already in $U$ having the same third element. In that exceptional case we have $H^*(\{X(i),X(j)\})$ colliding with that earlier and different $H^*(\{X(i'),X(j')\})$ already in $U$, and stop with that collision.

Each new entry in $U$ costs a small constant number of hashes, and since the value we want to collide is $2b/3$-bit, we need $\mathcal O(2^{b/3})$ hashes.

As the saying attributed to the NSA by Bruce Schneier goes: Attacks only get better; they never get worse, and I see room for improvement. Independently, the attack can be adapted to other moduli (like $2^b-1$ or the largest $b$-bit prime) at moderate extra cost.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.