Revealing percentiles of an ordered dataset without revealing its size

N J

1/20/23, 7:52 PM

Given an ordered set $S$ of positive integers (eg. $S=\{503, 503, 520, 551...N\}$) I want to be able to reveal the percentile rank (eg. 503 is in the top 10th percentile) for each element of a contiguous subset of $S$ (ie. $\{s_i,s_{i+1},... s_k\} \;|\; i \ge 0, k \lt N$). However I don't want to leak information that can be used to efficiently deduce $N$.

Using the formula for calculating a percentile rank of a given score from wikipedia:

$$P = \frac{\text{# values below score } s - (0.5 \times \text{# of scores with value }s)}{N}$$

We should be able to solve for $N$ with two percentiles $p_1$ and $p_2$ and the number of scores between them, $n$ using this formula.

$$ N = \frac{n}{p_2-p_1} $$

As a demonstration, given a randomly generated dataset of $N$ of $10,000$ and values

$p_1=0.0751, p_2 = 0.0951 \text{ and } n=200$

$$N = \frac{200}{0.0951-0.0751}=10,000$$

Is there anything that can be done to maintain as much accuracy as possible while preventing the efficient determination of $N$ (something like differential privacy)? If this is possible I'm assuming I'll need to introduce some randomness, however I'm not sure how to formulate how much will be required.

0 + 0

data-privacy

differential-privacy

bmm6o

1/20/23, 9:05 PM

Your prose description talks about "X is in the top 10th percentile" while your example gives an answer to 4 decimal places. Which model do you want to support?

N J

1/21/23, 4:39 PM

@bmm6o integer precision is fine (eg. 29th Percentile).

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Revealing percentiles of an ordered dataset without revealing its size

TH: การเปิดเผยเปอร์เซ็นต์ไทล์ของชุดข้อมูลที่สั่งซื้อโดยไม่เปิดเผยขนาด

RO: Dezvăluirea percentilelor unui set de date ordonat fără a dezvălui dimensiunea acestuia

RU: Выявление процентилей упорядоченного набора данных без раскрытия его размера

VI: Tiết lộ phần trăm của tập dữ liệu được đặt hàng mà không tiết lộ kích thước của nó

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.