Score:2

Apply local differential privacy to a datasets

mf flag

How to apply local differential privacy to specific categorical values in order to perform some analysis? Does there exist a tool?

For example, I have the following dataset.

   email               address
0  exampleemail1        exampleadress1
1  exampleemail2        exampleadress2

From this dataset, I take as output some results

After the injection of statistical noise, I want to have the following dataset

    email               address
0  noise                exampleadress1
1  exampleemail2        exampleadress2

From this dataset, I take as output also some results.

In the end, I want to compare my new results to the previous one.

I am looking at different libraries such as pydp or pipeline dp but cannot find an example

In fact I want to apply LDP to every PII in my dataset

Daniel S avatar
ru flag
Adding noise to categorical data doesn't often make much sense. The distributions are discrete rather than continuous, without any strong sense of order, and typically with no sensible metric beyond the discrete metric. The epsilons and deltas of differential privacy are more built for real valued data and functions thereon.
Score:0
sa flag

Well, I don't know exactly what you want, but Pydp has the following example as a tutorial see link here:

Imagine a fictional restaurant owner named Alice who would like to share business statistics with her visitors. Alice knows when visitors enter the restaurant and how much time and money they spend there. To ensure that visitors' privacy is preserved, Alice decides to use the Differential Privacy library in this case PyDP library.

Alice wants to share the information with potential clients which include 4 main scenarios in total.

Count visits by an hour of the day: Count how many visitors enter the restaurant at every hour of a particular day.

Count visits by day of the week: Count how many visitors enter the restaurant each day in a week.

Sum-up revenue per day of the week: Calculate the sum of the restaurant revenue per weekday. Sum-up revenue per day of the week with preaggregation.

Example outputs private and non-private are given and discussed.

Perhaps stating what's the shortcomings in this example would enable people to understand your cryptic question.

xavi avatar
mf flag
I just want to mask the PII of my dataset using LDP as I do not trust the database curator. That's all. So maybe a differential private synthetic dataset generation is most suitable for my case
xavi avatar
mf flag
can you share the link of the previous example?
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.