Score:2

Efficient private set intersection protocol for small sets

us flag

I need to implement a PSI that will be used on mobile devices to find mutual contacts. Assuming the set cardinality from both parties A and B are less than 1000, what would be the most efficient PSI protocol that I can use in such a scenario without involving a third party? Would it be possible to extend this protocol for multi-party PSI?

Crypto Learner avatar
in flag
Is it a requisite that both parties receive the output?
tarun14110 avatar
us flag
Yes! Both parties should receive output.
Score:2
us flag

I'll focus on semi-honest security in this response. You can divide the relevant PSI protocols into two categories:

  • In Diffie-Hellman-based protocols (originating in Huberman-Franklin-Hogg), the parties must calculate a few exponentiations for each item in their sets.

  • In Oblivious-transfer-based protocols (the leading ones in this case would be KKRT and Chase-Miao), the parties first perform a few hundred base OTs (each requiring a few exponentiations), regardless of the size of their PSI sets. Everything that comes later in the protocol uses only much more efficient symmetric-key operations, like calls to AES/SHA.

The Diffie-Hellman approach invariably has lower communication cost, no matter how big the PSI sets are. And when the PSI sets are small, the time to perform a few exponentiations per item is less than the time to perform the "base OTs" of the other protocols. In general, Diffie-Hellman PSI for small sets is a great rule of thumb.

But how small is small? You mention sets containing ~1000 items. With sets of this size, it's no longer absolutely clear which protocol is best. In experiments that we have run in our research group, the cutoff (where OT-based protocols become faster) is just below 500 items on very fast networks and just above 1000 items on very slow networks. But even on fast networks, the difference in performance is nothing to get hung up on for this application: we're talking 0.35 vs 0.25 seconds to perform PSI.

In summary, I would recommend implementing the Diffie-Hellman PSI protocol of Huberman-Franklin-Hogg, since (1) running time is close enough to the lowest; (2) communication is lowest; (3) simplicity of the protocol makes it easier to implement.

Edit Dec 2021: We have recently published an improvement to the Huberman-Franklin-Hogg protocol for PSI on small sets. It is faster, uses less communication, and has stronger (malicious) security. This would be my latest recommendation.


To answer your question about multi-party PSI: It is now fairly easy to convert most 2-party PSI protocols to multi-party.

Most PSI protocols are built from an underlying oblivious PRF (in Diffie-Hellman PSI, the oblivious PRF is $x \mapsto H(x)^k$ where $H$ is a random oracle). In KMPRT we showed how to construct multi-party PSI from 2-party OPRF. One of the constructions is secure against semi-honest adversaries, and the other has a security property with more fine print. In a paper that will appear at Crypto 2021 we show that this second construction is in fact secure against malicious adversaries.

Overall, multi-party PSI essentially involves each party doing a modified 2-party PSI with one central party (the one who receives the output).

tarun14110 avatar
us flag
Thanks for your answer. The PSI sets would vary between 50 and 2000. However, the average size of the set is around 150. I think as you suggested Diffie-Hellman-based protocols would be the best fit. Can you point me to the experiment results from your research group if they are publicly available? Can you also point me to a couple of the latest DH-based PSI research papers to start with?
us flag
At the time of your question I was not ready to share our latest work and experiments. But in [this paper](https://eprint.iacr.org/2021/1159) from CCS of this year, we have an improvement to the DH-PSI protocol for small sets. You can see our latest experimental results in this paper.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.