Score:14

Cryptographically safe lookup of value in a set

cn flag
vnd

I'm looking for an elegant solution to the might-seem-trivial problem of looking up for specific value in a known set of values without disclosing what value we look for. Let me describe it in a classical way:

Alice will soon celebrate her birthday and she wants to know if anyone in her class has the birthday on the same day as her. Unfortunately, the only person who knows the dates of birth of all the people in the class (except Alice's one) is Malory. How Alice could ask Malory if anyone has the birthday on the same day without disclosing her own date?

Assumption:

  1. Malory is open to helping Alice and she will answer any questions legitimately. Yet, she can only answer yes or no.
  2. Alice will ask Malory a single time. Malory cannot influence Alice to make her ask the same question once again. Though, in future Alice might on her own will ask about the different dates (e.g. Bob's birthday).
  3. Alice doesn't need and doesn't want to know the dates of birth of everyone in a class. She would like to get an answer from Malory asking a single question.

A pretty straightforward way to approach this problem is to apply hash. Alice hashes her birthday and passes this information to Malory, the same time asking her to hash the birthdays of all the people she knows and tell her if any match. The problem is that Malory, who would like to know Alice's birthday, can enumerate all possible dates (as the entropy of input is low) and check one by one if any matches with what Alice asked about. I would like to eliminate that threat.

I have thought about applying the one-to-many mapping to the birthday. As one date could be represented by multiple values, the enumeration attack would become unfeasible. The first problem is I don't see an easy way to let Malory do an atomic mathematical comparison with her set of dates. The second problem is if such comparison can be done, it is still possible for Malory to make a new set and do the lookup operation once again (what effectively translates to enumeration attack). Thus, the one-to-many mapping Alice applies should be parametrized by the unique feature of a set Malory will be searching in.

I do hope I haven't brought too much confusion and I'll appreciate any help, references, mentions of algorithms, or similar problems! I'll also appreciate your opinion if you think this problem is not solvable with given assumptions! If some of them are unclear, let me know and I'll do my best to clarify my intent.

cn flag
vnd
Lying is undesired but it's not my objective to prevent it. My main goal is to prevent Malory from disclosing Alice's date of birth assuming Malory will have no interest in giving the wrong answers.
Mark avatar
ng flag
It may help if you clarify "Alice don't need and don't want to know the dates of birth of everyone in a class. She would like to get an answer from Malory asking a single question.". This could be read as "you don't have to send over the whole database (but can if you want)", or "the protocol should have some zero knowledge property" (which sending the database would violate). The answers so far have assumed the second, but if the first is fine it is *much* more efficient while still satisfying all of your desired security properties.
Score:13
ru flag

You are describing the problem of 1 out of $n$ Oblivious transfer with $n=366$ if it is required that Alice receives no extraneous information. The methods of Kolesnikov et al in their paper “Efficient batched oblivious PRF with application to private set intersection” gives a sense of what is possible.

If it is acceptable for Alice to obtain additional information, then the problem is one of private information retrieval. Ostrovsky and Skeith’s survey paper “A survey of single-database private information retrieval: techniques and applications “ provides a good overview.

kelalaka avatar
in flag
This survey about PIR was old, and there was a survey about the measurement of PIR ( was UNY but can't remember the author correctly Gene Tsudik?) that if client downloads all data and process locally were faster than all existing schemes. The novel PIR is based on [FHE](https://pdfs.semanticscholar.org/bfde/352ba1bdb70bf850b18a08c9da092489b698.pdf) though the server calculates FHE operations...
Score:5
es flag

Second, simpler approach, thanks to a comment by @Mikero:

Alice picks a uniformly random private key $b$. Alice's birthday is on the $k$th day of the year. Alice sends $A = bH_p(k)$ to Malory, where the function $H_p()$ means to hash the input and interpret the result as an EC point.

Malory picks a uniformly random private key $e$. Malory has a function $F_e(i)$ which outputs $eH_p(i)$ if at least one person is born on that day of the year, and otherwise outputs a randomly chosen EC point.

For each value of $i$ such that $0\leq i \leq 365$, Malory sends $F_e(i)$ to Alice. In addition, Malory sends $Q = eA$ to Alice.

Alice calculates $Z = b^{-1}Q == b^{-1}eA == b^{-1}ebH_p(k) == eH_p(k)$.

Alice then checks if $Z$ matches any of the 366 outputs of $F_e(i)$ that Malory sent to Alice. If there is a match, then Alice shares a birthday with someone else.

Explanation: We've created a "1-OPRF", which means an oblivious pseudo-random function which Alice is able to query once with the blind assistance of Malory, but which Malory can query many times.

The PRF is simply Malory blinding an EC point $H_p(i)$ (where $H_p(i)$ represents a particular day $i$ of the year as an EC point), with his private key $e$, by calculating $eH_p(i)$. Because of the elliptic curve discrete log problem, no observer (including Alice) can predict the output of $eH_p(i)$ for any $i$ without knowledge of the secret $e$. This means that when no one is born on a particular day, Alice cannot tell that Malory has sent a random EC point instead of the real output of the PRF, because the output of the PRF is unpredictable to Alice.

Alice blinds her input to the PRF by picking a blinding factor $b$ and sending $bH_p(k)$ to Malory instead of sending (and thus revealing) $k$. Alice un-blinds the response by reversing the multiplication with $b$, and is thus able to query the PRF without Malory having knowledge of the input Alice is using.

The consequence is that Malory has sent a list of EC points to Alice that only include the true output of the PRF for those days of the year that at least one other person is born on. In addition, Alice has been able to blindly query the PRF for the output of her particular birthday, and is able to see if that result appears in the list of possible PRF outputs sent to her by Malory.

Malory has not learned $k$, and Alice is unable to learn anything other than whether she shares her birthday with at least one other person.

Matthieu M. avatar
ru flag
Why cannot Alice guess `e` in this scheme? They know `A` and `Q = eA`, can't they "divide" `Q` by `A` to access `e`?
knaccc avatar
es flag
@MatthieuM. The elliptic curve discrete log problem means you can't "divide" one point by another point. If you could, you could easily look at any public key and determine the private key.
Score:4
cn flag
Mac

Welcome to the forum, @vnd !

Might your question be solved using hash-based k-Anonymity? As I understand it, this approach would:

  • prevent Mallory from discovering Alice's birthday;
  • prevent Alice from knowing any of the birthdates of her classmates; and
  • only allow Alice to discover whether any of her classmates share a birthday with her

Let's suppose Alice has 12 classmates whose data is represented by this table:

                    ()                            
Bob       11 JAN 2001      a40b69b979ef6af5e9f13a49cfc568d8b942d5c2      a40b6
Joe       23 MAR 1989      4fde4b6b8e077d5b51eed716ab3d94a6ac04c45e      4fde4
Ben        9 JUN 2002      46885da4ffaa4c3d1b31413f96c38f2cb7e895ea      46885    
Art        4 DEC 2005      a40b6425e2a7a93a9ac95ee275a5398397c46dd2      a40b6
Tom       17 NOV 1977      a49e374c34333b86ccf08bc10d6e04312e772c41      a49e3    
Tim        3 JUL 1989      39e95ac6c6286e6f036822f3fa31131a2e892b08      39e95
Amy       12 FEB 2002      92dac31b3d3a0793fd2845081c93024d0ea8ac8c      92dac
Eva       24 APR 1990      a0ed580e3df29f9a8f22276092ac9f58117401ec      a0ed5
Sue       10 JUL 2003      a93703839d02a539c12841f5de2ec8790107925b      a9370
Zoe        5 JAN 2006      a40b6232f910b358e971e4c5f91e273c07499ab0      a40b6
Lia       18 DEC 1978      addc1fc5fbe7dea93e3bd1d421521f005ba89c8e      addc1
Kay        4 AUG 1990      a4e15e622b89d302f2b0357c2b2efc0d38fba7a0      a4e15

Alice     11 JAN 2001      a40b69b979ef6af5e9f13a49cfc568d8b942d5c2      a40b6

TABLE DATA
This table includes the given name; date of birth (in military format, for cosmetic reasons); the hash value of the birthdate using a non-existent, made up hash (MUH) and the first 5 graphemes of the hash (fragment).

PROCEDURE
. Alice computes the message digest of her birthdate using the made up hash (any hash could be used, even the weak SHA1).

. Rather than communicate the entire hash value to Mallory, which would be perceived as a threat, Alice only gives Mallory a fragment of her hash: the first five graphemes (this is adjustable).

. Mallory knows the dates of birth of everyone in the class, except for Alice. Mallory computes the hash of each classmate's birthday, and returns that data to Alice, but only when Alice's fragment matches the same fragment of a full hash. The data returned does not include the fragment, as this would be redundant.
    . Using a 5-grapheme fragment a40b6, Mallory would give the following data to Alice

9b979ef6af5e9f13a49cfc568d8b942d5c2
425e2a7a93a9ac95ee275a5398397c46dd2
232f910b358e971e4c5f91e273c07499ab0

    and Alice would see if her fragment plus any response matches her full hash. In this case, the first one is a match and Alice has her answer. She now knows that someone in her class shares a birthday with her, though she does not know their identity.

In this exchange we see that Mallory, having received only a fragment, cannot determine whether any of the hashes she computed fully match Alice's actual hash, as she has no way of computing Alice's full hash.

The possibility of a threat is eliminated because the decision of whether a match exists cannot be determined by Mallory (hostile server), it can only be determined by Alice (honest client).

Further, using the data supplied by Mallory, Alice cannot deduce the birthdate of any of her classmates except those that match her hash. Partial matches are inconclusive.

The amount of data returned can be adjusted by the size of the fragment:

    . Using a single character fragment a, Mallory would give the following data to Alice:

40b69b979ef6af5e9f13a49cfc568d8b942d5c2
40b6425e2a7a93a9ac95ee275a5398397c46dd2
f9e374c34333b86ccf08bc10d6e04312e772c41
0ed580e3df29f9a8f22276092ac9f58117401ec
93703839d02a539c12841f5de2ec8790107925b
40b6232f910b358e971e4c5f91e273c07499ab0
4e15e622b89d302f2b0357c2b2efc0d38fba7a0

NOTE: This approach does not require us to know how many possible birthdays there are (365). Three hundred or four million, it wouldn't matter.

Score:4
es flag

One approach is to do 1-out-of-n oblivious transfer. Malory will send Alice a list of 366 encrypted booleans (to allow for birthdays on Feb 29), where each boolean will signify whether anyone has a birthday on that day of the year. Alice will only be able to decrypt one of the 366 booleans, and will thus only be able to know if anyone else has a birthday on that day of the year.

The wikipedia article I linked to in the paragraph above references several implementation approaches to achieve 1-out-of-n oblivious transfer. Here is one approach that I thought up just now (with credit to @IlmariKaronen for a very elegant improvement):

Alice is born on day $k$ of the year, where $0\leq k\leq 365$. Alice sends the public key $P = xG-kH$ to Malory, where $x$ is a uniformly random scalar private key. $H$ is calculated as $H = H_p(G)$, where the function $H_p()$ means to hash the value and interpret the result as an EC point. Choosing $H$ in this manner means the discrete log of $H$ w.r.t. $G$ is unknowable, i.e. $h$ cannot be known such that $H==hG$.

Malory calculates 366 public keys $\{Q\}$ of the form $Q_i=P+iH$ for all values of $i$ where $0\leq i\leq 365$.

The private key corresponding to each public key $Q_i$ will be $q_i == x - kh + ih == x+(i-k)h$.

Since we've already demonstrated that $h$ is unknowable, this means that $q_i$ will only be knowable by Alice when $i==k$, in which case $q_k$ will be equal to $x$.

Thus, Malory will have computed 366 public keys for which Malory can be sure that Alice can only know the private key corresponding to one of them.

For each day of the year $i$, Malory uses the El Gamal scheme as follows: Malory picks a uniformly random scalar $e_i$, and provides Alice with the pair $(E_i, F_i)$ where $E_i = e_iG$, $F_i = e_iQ_i + H_p(v_i)$ and where $v_i$ will be specified as $0$ if no one has a birthday on that day of the year, or as $1$ if at least one person has a birthday on that day of the year.

To decrypt $v_k$, Alice computes $V_k = F_k - xE_k$. Alice then checks whether $V_k\overset{?}{=}H_p(0)$ or $V_k\overset{?}{=}H_p(1)$, and thus knows if $v_k$ is $0$ or $1$. This works because $q_k==x$ and $Q_k==xG$, and so $xE_k==x\cdot e_kG==e_k\cdot xG==e_kQ_k$.

Alice will have been able to determine only if someone else was born on day $k$ of the year, and Malory will not be able to determine $k$.

ar flag
I'm not sure I understand your scheme completely, but it occurs to me that, instead of messing around with ring signatures, couldn't Alice just let $C_i = (i-k)H_p(G) + c_kG$? That way Mallory can verify that $C_{i+1} = H_p(G) + C_i$ holds for all $C_i$, and thus be confident that Alice cannot know the discrete log for more than one of them. (Alice doesn't even really need to send all the points to Mallory, since Mallory can just compute them all from $C_0$ by repeated point addition.) Or maybe I messed up something; it's always possible, since I'm really not very familiar with ECC.
knaccc avatar
es flag
@IlmariKaronen That's very elegant, you're right, I wish I'd thought of that! I'll amend my answer.
us flag
I believe that this answer is converging towards an Oblivious PRF. Alice & Mallory run an OPRF protocol where Alice learns a random function $F : \{1,\ldots,365\} \to \{0,1\}^\lambda$ and Mallory learns $F(i)$ where $i$ is her birthday. Since it's a PRF, all other outputs of $F$ look random to her. Alice can then send $\{ F(j) \mid j \in \mbox{Birthdays}\}$ and Mallory can see if there's a match. [It is indeed possible](https://ia.cr/2020/1043) to do this kind of OPRF with constant communication. That link contains a UC-secure protocol; I doubt the one in this question is UC secure.
knaccc avatar
es flag
@Mikero thanks, that's an interesting paper. I didn't completely follow what you said, but I added a second answer to this question that incorporated the 1-OPRF approach from this paper. Is there an improvement over my method?
knaccc avatar
es flag
@Mikero Did you maybe mean to say that it would be Malory that would send $\{F(j)\}$ to Alice, rather than the other way around?
us flag
Yes, I believe I accidentally swapped the names of the two parties (too late to edit the comment now).
Score:3
in flag

There is a novel Private Information Retrieval with Fully Homomorphic Systems,

This is a Private Encryption Scheme that can the value at index $i$ from the array on the semi-honest server. The server learns nothing and only returns the value at index $i$. We can modify this as;

Basic Scheme
  1. Mallory encrypts all existing birthdays $b_i$, $0 \leq i \leq 366 $ with the FHE public key of Alice. (Encryption of $366\cdot 8$ bits, not necessarily all days, the existent birthdays are enough and this saves space in large sets.

    $$c_i = \operatorname{FHE-Enc}(A_{pub}, b_i)$$

    If we use bit-based (HLibe, TFHE) then each $c_i$ is an array size 8.

  2. Alice sends her birthday $A$ encrypted with her public key to Mallory (Encryption of 8-bit).

    $$a = \operatorname{FHE-Enc}(A_{pub}, A)$$

  3. Mallory executes the FHE Equality circuit on each encrypted birthdays

    $$t_i = \operatorname{FHE-Compare}(A_{pub}, c_i,a)$$

  4. Mallory OR the results with FHE using binary tree approach to reduce the depth of circuit (it is possible with the summation instead of OR) and return the result back to Alice.

    $$o = \operatorname{FHE-OrAll}(A_{pub}, [t_0,\ldots,t_n])$$

  5. Alice decrypts the resulting bit with their private key;

    $$ b = \operatorname{FHE-Dec}(A_{priv})$$

    • if $b=1$ then there is at least one student with date $A$
    • if $b=0$ then there is no student with date $A$.

What is guaranteed;

  1. Mallory can not learn what is the queried data $A$ since it is encrypted with FHE - semantical security exists.
  2. Mallory can not induce the queried data from the result for the same reason as above; from $(A,o)$.
  3. The circuit doesn't reveal anything about the internals of the data other than what function is computed - the circuit calculates the existence of data without revealing the result.
  4. Alice only learns the existence of the birthday nothing more! Alice gets only one bit $b$, exist or not. This doesn't return the count of equal birthdays.
  5. Alice only queries once.
  6. Mallory sends only encryption of one bit $b$! Therefore a quite band-efficient scheme.
Improved Scheme for mainly processing time
  1. Mallory prepares a bit array of size 366 initialized to zero. $$B = [b_0,b_1,\ldots,b_{366}] = 0$$

  2. Mallory set the existing birthdays to 1. Like; $$B = [0,0,1,\ldots,0] $$

  3. Mallory encrypts the bit array with the public key of Alice (FHE public key) and gets another array of size 366.

    $$c_i = \operatorname{FHE-Enc}(A_{pub}, B[i]) \;\; \text{for }\; 0\leq i \leq 366 $$

  4. Alice prepares another bit array of size 366 where only her birthday is set to 1 rest set to 0. $$A = [0,0,0,\ldots,1,\ldots,0] $$

  5. Similarly to Mallory, Alice the array with her public key and sends the resulting array to Mallory.

    $$a_i = \operatorname{FHE-Enc}(A_{pub}, A[i]) \;\; \text{for }\; 0\leq i \leq 366 $$

  6. Mallory executes FHE-AND operation component wisely on the encrypted arrays. $$c_i = \operatorname{FHE-And}(A_{pub}, A[i], B[i])$$

  7. Mallory OR bits of the resulting array and return the result back to Alice.

    $$b = \operatorname{FHE-OrAll}(A_{pub}, [c_0,\ldots,c_{366}])$$

  8. Alice decrypts the resulting bit with their private key;

    $$ b = \operatorname{FHE-Dec}(A_{priv})$$

    • if $b=1$ then there is at least one student with date $A$
    • if $b=0$ then there is no student with date $A$.

What is guaranteed;

  1. The same as the basic scheme.
  2. This time instead of encrypting 9-bit, Alice encrypt 366-bits (~40-time increase of the initial sending cost/time)
  3. The Mallory, on the other hand, received a lot of improvement on the time and memory
    1. There is no need for a heavy FHE-equality operation.
    2. The size of the stored ciphertext was reduced by ~40.
  4. The result, on the other hand, is still encryption of 1-bit.
  5. This scheme saves a lot of time on the process.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.