Verify that a user submitted data without identifying which data was submitted

jdkula

1/14/24, 3:24 AM

I'm not fully sure what the most accurate terms would be to describe what I'm looking for, but here's the gist:

Let $u_1, \cdots, u_n$ be the users from a set of users $U$. Users may or may not submit some data $d_{u_1}, \cdots, d_{u_n}$ into my database, which I'll model as a set of data $D$. I want to be able to determine the set of users $U_{submitted} \subset U$ who submitted data to the database, while retaining the anonymity of the data. That is, I should be able to read any piece of data $d_i \in D$ without knowing who submitted it; but I should be able to tell that some user $u_i$ either submitted one or more piece(s) of data into $D$, or that $u_i$ submitted no pieces of data into $D$, without revealing the data they submitted.

I would love any direction on what I should be looking for here; thank you!

2 + 0

implementation

Score:1

Crypto

ElderlyPedant

1/14/24, 11:22 AM

The term “user” would normally mean, the credentials (eg. login name) that were used to log-in to the system and/or database.

But you refer to “my database”, so it sounds to me like you’re the administrator of the database. In which case, I can’t see how you can possibly achieve what you want - using cryptography, or anything else - if “user” is defined as above.

As an administrator, you can just write code to (a) monitor database logins, (b) note any/all data submitted (by those logins) for addition to the database, and (c) put all that together into a nice report

So at any later time, you would know, for example, that on 1 Jan 2023 at 3 pm, authorised user XXX added records YYY and ZZZ to the database.

Certainly using end-to-end encryption could prevent you from actually reading YYY and ZZZ - but thats not what you’re asking for.

All of this seems obvious, so I think you need to clarify exactly what you mean by “a user”, and exactly how those users relate to system and database login credentials.

I’d do this as a comment, but it seems a bit too long, and I don’t think I can enter comments anyway yet.

+ 0

Score:0

Crypto

knaccc

1/14/24, 11:55 AM

You would need batches of users to submit pieces of data together as a group. Otherwise, if submissions trickled in one at a time, you could constantly monitor the list of users that have submitted data (or use backups/timestamps/record sequence ids in the database to retrospectively achieve this). It will then be obvious when a user submits data for the first time, and it'll be obvious which piece of data is theirs, because they will suddenly appear in the list of users that have made submissions when that piece of data arrives.

For a batch of user submissions:

Require all users in the batch to have a public/private key pair. Maintain a list of all user public keys corresponding to all user identities, and make the list of public keys available to all users.

Require all data submissions to include a Linkable Spontaneous Anonymous Group Signature (LSAG).

When a user contributes data to the batch, require it to be signed with an LSAG signature. The group signature would reference the public keys of all users in the batch.

The way LSAGs work is that a 'tag/key image' will appear with each signature. No one can tell which public key or user identity the tag corresponds to. They can only know for sure that it corresponds to one of the public keys associated with the batch. The LSAG verification enforces consistency, which means there is only one valid tag per public key that can be declared with a signature.

Therefore, even if some users contribute multiple pieces of data and submit multiple signatures, it can be publicly verified that a certain number of distinct users have submitted data. The batch can be built up submission-by-submission in public view. When the number of distinct tags matches the number of users required to submit data as part of the batch, the batch is ready for inclusion in the database.

The database now has a record of every user/public key that has ever made a submission, and it can be known whether any particular user has not yet made a submission.

Note that the anonymity set size depends on the size of the batches. If only two users contribute data as part of a batch, then the database won't know which user submitted which piece of data. However, for any particular piece of data submitted in that batch, the database will know it must have come from one of the two users that participated in the batch.

+ 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Verify that a user submitted data without identifying which data was submitted

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.