Score:0

Securely estimate share ratio without a centralized tracker in peer-to-peer with HyperLogLog?

ca flag

Lets say you had a distributed storage system like bittorrent or ipfs and you wanted to track the upload/download (or response/request) ratios of the peers. However, you don’t want to use a centralized tracker and keep everything peer to peer? Can you estimate the u/d ratio securely in a distributed way without overwhelming the system?

One thought I had was to modify HyperLogLog (HLL) with a combination of this answer: Is it possible to create a "proof-of-upload" system for BitTorrent ratio tracking?.

In this system everyone has public/private key pairs and requesters include signed tokens for each Xmb of data (each token is for the same amount of data). On each request to another node, the processing node would store the hash of the token in a hyperloglog under the requester id and returns a new signed response token with the response. If the processing node returns a good reply, the requester stores the response token in a hyperloglog under the provider id.

If you had peers: p1, p2, p3 and you wanted to calculate the ratio of requests made to requests provided for p1 you could so something like:

  1. P2 and p3 merge hyperloglogs of response tokens from p1 into hll_1
  2. p2 and p3 merge hyperloglogs of requests tokens from p1 into hll_2

The share ratio is cardinality(hll_1)/cardinality(hll_2)

To make the HLLs “secure”, for each HLL “bin” which counts the trailing zeros of the hash, record the token and signature which produced the hash which has the greatest number of zeros per bin. This way when p2 and p3 merge their HLLs together they can confirm that the HLL from each other were made from signed tokens and reject if the signatures dont match. P1 can look at the combined HLLs and confirm they are from requests they signed. This would force each node to prove that they have seen the tokens which make up their HLL.

Attacks

P1 can verify p2 and p3 are honest by checking their reported HLLs. If p2 or p3 either underreport responses (they received a response from p1 but decided not to include it in the HLL) or overreport requests (ie p2 could have gotten a request from p1 but not have sent a response) P1 can simply block them from its routing table.

P1 and P2 or P3 could collude by exchanging tokens without doing work so you would still need some type of central service to issue JWTs to control membership to the swarm and look for collusion.

Another attack angle is that requesters could potentially filter their requests and only send requests that don’t increment the HLL (ie make sure the hash of the token has no trailing zeros). To prevent this you could have the each node append a salt to the token before it is hashed and inserted into the HLL. It would also prevent token reuse.

Lossy Compression

It seems with HyperLogLogs you would only need to keep as many signatures as you have bins to verify very large amounts (billions) of requests to a reasonable amount of accuracy. These signatures in the HLL would take up a lot of space but much less space than storing every request!

You can merge large amounts of HLLs by using a DHT like kademlia to pick nodes to send the HLLs so the swarm can report info about a particular node to a a common place.

I’m sure I’ve missed something fundamental about why this wouldn’t work or ways to make it better. Does this scheme work or is there an easier way to do it?

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.