Score:1

Checking equivalence among distributed sets

kr flag
  • I have elements from $\{0, 1\}^{n}$ (range of a hash function)
  • The master $A$ can have any subset of this range.
  • There are clients that each have a subset from the space, too.
  • I want to make sure that the union of the clients' sets is equal to the master set
  • The communication should be as least as possible.
  • The elements are secret. (this requirement can be relaxed with a solution that could potentially leak)

thanks to @kelalaka for refining the question.

knaccc avatar
es flag
Isn't it trivial to filter out duplicates?
zetaprime avatar
kr flag
No it's not trivial, because they are distributed. Filtering out duplicates would mean copying the data to multiple locations or to a single location.
zetaprime avatar
kr flag
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/133625/discussion-between-zetaprime-and-kelalaka).
kelalaka avatar
in flag
If you can relax the secret, this is better for CS, IMHO.
Maarten Bodewes avatar
in flag
@knacc & all I've removed a lot of comments of zetaprime as they didn't make sense to third parties (anymore). If anything is missing, please add them again.
Score:1
gb flag

A potential option would be to use Bloom filters to identify potential duplicates probabilistically, and then check if they really are duplicates by sending the few potentials. If you use large enough bloom filters this would be sufficient for any size.

Alternatively, which may not be a great fit after all, you could use miniscketch.

From the readme,

Sketches, as produced by this library, can be seen as "set checksums" with two peculiar properties:

  1. Sketches have a predetermined capacity, and when the number of elements in the set is not higher than the capacity, libminisketch will always recover the entire set from the sketch. A sketch of b-bit elements with capacity c can be stored in bc bits.
  2. The sketches of two sets can be combined by adding them (XOR) to obtain a sketch of the symmetric difference between the two sets (i.e., all elements that occur in one but not both input sets).

So assuming you use a large enough capacity, you can just XOR the sketches to identify the unique elements and remove the rest.

kelalaka avatar
in flag
We don't know the possible size of the universe and the subsets $A$ and the clients. yet
meshcollider avatar
gb flag
Hmm, updated with another potential solution
kelalaka avatar
in flag
I think miniscketch will fail on the security requirement and Bloom, too
zetaprime avatar
kr flag
for the bloom filters: I can see if all clients would have the blooom filters of others then I guess they could agree on a protocol to discard the duplicates. How do I then check the equality of the union?
zetaprime avatar
kr flag
for the minisketch it writes, "A sketch of b-bit elements with capacity c can be stored in bc bits." unless I misunderstood it, it's somewhat storing all the elements in the sketch. I will read further.
meshcollider avatar
gb flag
@zetaprime yeah the elements are recoverable from the sketch so it's probably too large for your purposes. I was mainly focused on the additive part initially.
meshcollider avatar
gb flag
In terms of testing equality, this may help: [A probabilistic set with no false positives?](https://cstheory.stackexchange.com/questions/6596/a-probabilistic-set-with-no-false-positives). I can imagine one scenario where, after you have removed all of the duplicates between the clients, you could do some sort of RSA accumulator deterministically and then ensure equality of those. Essentially hash each element to a prime (hopefully with negligible chance of collision), and raise a fixed generator to all those prime powers in a finite field. It would be probabilistic though.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.