Score:4

Zero knowledge validation to check if private element is in a set hidden on a central server

bd flag

I am trying to determine how to apply ZK proofs to the asymmetric scenario below.


Say there is a list of 5000 secret elements on a centralized web server.

Users are trying to guess what those 5000 elements are.

However, users don't want to reveal their incorrect guesses to the central server.

Users submits proofs of their guess selection to the web server, and the server checks if the proof shows the user guessed one of the 5000 elements.

Importantly, the goal of this scheme is: 1) users should not be able to verify locally if their guess is a part of the set, and 2) the server should not know what a user's guess was, only if it was a part of the set.


How can this be implemented?

Further, what libraries would you use to implement this scheme (across any language)?

Score:2
ru flag

Sounds like a scheme where $\phi$-hiding would work well.

A scheme for a single secret element

Write a publicly-known hash-to-prime function that maps guesses to primes of around, say, 128-bits (e.g. take the first 128-bits of SHA3 output, treat as an integer and look for the next largest prime).

Hash your secret element to a prime $s$ and then construct a, say 1024-bit prime $p$ such that $s|p-1$ and another prime $q$ then multiply $p$ and $q$ to make a public value $N=pq$. Publish $N$ and you hash function. Note that $s|\phi(N)$.

To submit a guess, users hash their guess to a prime $g$, generate a random element $e\in(\mathbb Z/N\mathbb Z)^\times$ and raise it to the power $g$ modulo $N$ to create the response $r=e^g\mod N$. They send $r$ to the checker.

The checker takes responses $r$ and raises them to the power $\phi(N)/s$ modulo $N$. If $r$ is an $s$th (i.e. which is almost certainly not the case unless $g=s$) then they will get the answer 1 confirming a correct guess. Any other answer should be rejected.

Properties

If a user were able to determine that their guess were correct, this would break the $\phi$-hiding assumption.

The server gains zero-knowledge of an incorrect guess as for any guess that hashes to an incorrect prime $h\neq g$, the response $r$ could equally have arisen from the random element choice $e'=r^{1/h}\mod N$.

A multi-secret scheme

For the multi-secret scheme, in ordleer for the server to be unable to distinguish guesses we should have all of the guesses hash to the same prime $s$. This could be arranged for example by hashing guesses to 6000-bit values (e.g. using an XOF such as SHAKE) of which values $h_1,\ldots, h_{5000}$ correspond to secrets elements. We can then construct a 128x6000 binary matrix $M$ with the binary vectors whose entries match $h_1\oplus h_2, h_1\oplus h_3,\ldots, h_1\oplus h_{5000}$ lying in the nullspace of $M$. This ensures that $M\cdot h_i= M\cdot h_j$ for all $1\le i,j\le 5000$ and so treating this 128-bit value as an integer and looking for the next prime will lead to the same $s$ for all secret inputs.

One needs to be cautious though. In this scheme a user can check whether two guesses are both secrets by hashing them, XORing them and multiplying by $M$. If the answer is 0, then both guesses are probably correct. If this no longer satisfies your requirements, a different scheme may be needed.

Rurock avatar
bd flag
Thank you. I'm working on an implementation of the single secret scheme, but am still learning. A few questions. 1) What's the recommended way to efficiently find the next largest prime following the int-cast sha256? Is incrementally checking with miller-rabin k=64 sufficient? 2) How do you quickly/efficiently construct a 1024-bit prime where the (prime - 1) % s == 0 (the p variable)? 3) Is q just any prime, any random 1024-bit prime, or s|q-1 ? 4) What does the x mean in the random element set? 5) Is the response r = e^(g % N) or r = (e^g) % N?
Daniel S avatar
ru flag
1) Yes incrementing to the next odd number and some Millar-Rabin tests should be sufficient. 2) Generate a random 996-bit even cofactor $c$, test $cs+1$ for being prime and if you fail, increment $c$. 3) $q$ can be any random 1024-bit prime. 4) $\times$ is the multiplication operator i.e. the group is the multiplicative group mod $N$. 5) The response is $(e^g)%N$ and should be computed with a modular exponentiation routine.
Rurock avatar
bd flag
Thank you once more for all the help Daniel. So far, I've implemented the hashing scheme with a valid N = pq. I am working on computing the response. To clarify, as I am currently unfamiliar with the syntax, what sort of elements are a part of e and how would this random number generator be programmed? To confirm, for the final response submitted to the server, you send ((e^g) % N) (computed via an exponentiation routine)?
Daniel S avatar
ru flag
Simply generating a random $e$ mod $N$ will generate a suitable random element with overwhelmingly high probability. Yes, (e**g)%N is what I mean, but the mathematics formatting treats % as a comment. If you have further questions, we should set up a chat room as comments are not intended for extended discussion.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.