Score:5

ZKP: Prove that >18 while hiding age

cn flag
deb

I am relatively new to cryptography, but I've been programming for a while. Here's a story that sets well the problem I'm trying to solve:

Alice has a digital passport that's signed with her government's private key. Each property is signed separately, and it would still be verifiable that, for example, her first name is "Alice", without saying that her last name is "Smith".

From here, knowing that she can prove that her date of birth is 05/04/1975, can she prove that she is over 18 while neither revealing her birthdate nor asking a third party for help?

My guess is that she can't, but I'm hoping you'll surprise me!

Eugene Styer avatar
dz flag
Related: https://crypto.stackexchange.com/questions/19265/finding-out-the-greater-number-under-zero-knowledge-conditions
ar flag
Also related: https://crypto.stackexchange.com/questions/58349/zero-knowledge-proof-using-hash-chains-is-there-a-flaw-in-the-algorithm
Vadym Fedyukovych avatar
in flag
If you dont mind snarks while verification and follow Cinderella https://antoine.delignat-lavaud.fr/doc/oakland16.pdf there's a chance to hide the whole content of the certificate while proving statements about it.
Score:7
ar flag

AFAICT, no, she can't do that using just the information on her passport, as you describe it,* and treating the digital signature algorithm as a black box.

However, if the government has foreseen this use case and included just a little bit more information on her passport, there is a way to do it using e.g. the hash chain method described in this earlier Q&A thread (based on a 2013 paper by Angel & Walfish).

In particular, what Alice needs is:

  1. A preimage-resistant cryptographic hash function $H$ standardized by the government. Something like SHA-256 would be fine.

  2. A secret government-issued token $s$, computationally indistinguishable from an output of $H$. (For example, $s$ may be generated by selecting a random string $r$ with at least as many bits of entropy as the output length of $H$ and letting $s = H(r)$.)

  3. A statement signed by the government, of the form:

    Alice** was born on or before 2050-01-01 (hash = $H^a(s)$).

    where $a$ is Alice's age in days on 2050-01-01 and $H^a$ denotes $H$ iterated $a$ times, i.e. $H^0(s) = s$ and $H^{n+1}(s) = H(H^n(s))$

Now, to prove to Bob that she is currently (2021-11-24) 18 years old or older, Alice needs to produce the signed statement above, together with $h = H^b(s)$, where $b$ is Alice's age in days exactly 18 years ago (i.e. on 2003-11-24).

Bob can then calculate $H^c(h)$, where $c = 16840$ is the number of days between 2003-11-24 and 2050-01-01, and compare it with the hash in Alice's signed statement. If Alice's claim is valid, they should match, since $a = b + c$, and thus $H^c(h) = H^c(H^b(s)) = H^a(s)$.

(Of course, Bob should also verify that Alice's statement really is correctly signed with the government's signing key and that it really belongs to her and not, say, her older sister.)


Note that, for computational convenience and to mitigate timing attacks, it may be desirable for Alice to precalculate and store $H^{b'}(s)$ for various values of $b'$ corresponding to, say, her age in days at the beginning of each year so far. That way, she will only need to perform at most 365 hash evaluations on the spot to calculate $H^b(s)$ for any $b$.

To reduce Bob's computational demand, it may be desirable for the government to issue Alice multiple signed statements of the form above, but with different dates and corresponding values of $a$, so that Alice can send Bob a signed statement for, say, 2004-01-01 instead of 2050-01-01. Then again, a few thousand SHA-256 evaluations really isn't much these days, either, and Bob has no need to worry about side channel attacks.

However, issuing multiple signed statements also helps preserve the zero-knowledge property in the presence of rollover. For example, let's imagine that the government originally picked 2030-01-01 as the fixed date for the signed statements, but then decided a few years ago to move it to 2050 as they started to issue passports valid up to 2030 and later. This would mean that, at a minimum, Bob could learn from the date in Alice's statement whether her passport was issued before or after the transition, which might correlate with other information including her age. By giving Alice signed statements for both 2030-01-01 and 2050-01-01 (and possibly other applicable dates), this leak can be avoided, since Alice can just choose the statement with the earlier suitable date for Bob's request (but not, obviously, earlier than the date she wishes to prove she was born on or before).


Ps. As noted in my answer to the question linked above, the same protocol can also be used to allow Alice to prove her exact age. To do this, Alice needs to know not just $s = H(r)$ but also the original random token $r$, which should be chosen by the government so that it is not a valid output of $H$.

Then, if Alice wants to prove to Bob that she was born on 1975-05-04, and no earlier, she can reveal that date together with $r$ and at least one signed statement of the form above. Bob can then calculate Alice's claimed age in days $a$ at the date indicated in the statement, calculate $H^a(s) = H^{a+1}(r)$ and compare it with the hash in the signed statement.

Of course, with this variant of the protocol, Bob also need to verify the format of the $s$ or $r$ token sent by Alice (i.e. that it is a valid output of $H$ for "born on or before" claims, and not a valid output of $H$ for "born on this date" claims).

All that said, in your scenario a separate government signature for "Alice was born on 1975-05-04" would serve this purpose just as well.


Footnotes:

*) Your description that "[e]ach property is signed separately" is actually a bit problematic if taken at face value, since having the government sign an individual property such as "first name: Alice" means that anyone who has seen Alice's signed first name can make a copy of the signature and use it to claim that their first name is also Alice. Also, for example, if Alice Andrews and Bob Barker had passports with such separately signed first and last names, they could combine them to create a fake passport for "Alice Barker" or "Bob Andrews".

A slightly better solution would be to have one of the properties on the passport be a unique ID, and include that ID in all signed properties. Thus, for example, someone who has seen a signed statement of the form "ID: #123456789; first name: Alice" could now verify that Alice, bearing a passport with ID #123456789, indeed has the first name "Alice", but they cannot use the signature to claim anyone else's first name to be Alice.

Of course, this scheme still has its own shortcomings: for example, Alice might want to be able to prove to Bob that her name is Alice without Bob being able to prove that fact to anyone else. Implementing such a deniable authentication scheme seems outside the scope of this answer, although you might want to look at e.g. this earlier one.

**) Here I am using "Alice" as a stand-in for Alice's unique government-issued ID, as described in the previous footnote above.

fgrieu avatar
ng flag
Fundamental problem: if some gizmo can test Alice is ≥18yo now by interacting with her passport, and the passport has no memory of earlier such attempts, nothing prevents that gizmo from finding Alice's birth date by dichotomy. Even if the gizmo won't do it, as long as it's notion of "now" is not made secure (e.g. there's a way to set it's clock at will), it can be abused to find Alice's birth date. We might want to have the passport itself check a timestamp for freshness using challenge/response and certify Alice is presently not minor (the 18 must not be adjustable, at least finely).
ar flag
@fgrieu: True. I was implicitly assuming that (at least some of) the information on Alice's passport is e.g. encrypted with a key stored outside the passport itself, so that she can choose which pieces to disclose. Otherwise other parts of the question (like signing the passport fields separately) make little sense to me.
Maarten Bodewes avatar
in flag
In IEC 18013-5 (mobile Driving License / mDL) the data elements are combined with a random, and then hashed. The randomized hashes are then combined and signed in an MSO (without the random, obviously). That way you can verify the MSO and the data element. You have to give permission in the app to reply to any data request. You may be interested that the > 18 and > 21 use cases have been implemented in that standard (it allows two requests, which does mean you can determine that somebody is between 18 and 21 though).
deb avatar
cn flag
deb
This is an excellent answer, thank you so much for taking the time.
deb avatar
cn flag
deb
Doesn't this solution assume that Alice is telling the truth about her age in the first place when generating `Hb(s)`? I feel like I haven't understood this correctly.
ar flag
@deb: The key observation is that Alice cannot generate $H^b(s)$ for negative $b$ (at least not without additional information), since that would require them to find a preimage $H^{-1}(s)$ for the hash function. Assuming that $H$ is a secure [cryptographic hash function](https://en.wikipedia.org/wiki/Cryptographic_hash_function), that's not possible, since preimage resistance is one of their fundamental security assumptions. If Alice did find a way to compute preimages for, say, SHA-256, they'd have broken the hash and could become instantly famous, whatever their age.
ar flag
@deb: As for your earlier question about deniable authentication in this context, that's actually a very good one and I don't have an answer to it. I thought it *should* be possible somehow when I wrote the answer, but now that I think about it more, I don't see how. But maybe there's some clever ZKP trick that I'm missing. I might ask a separate question about it, if you don't.
deb avatar
cn flag
deb
@IlmariKaronen: Oh, I think I understand your answer a little bit better now (I've read it more times than I would admit, and yet I'm not sure if I got everything right!). I'm not used to that level of cryptography, so I'll think about it more when I'll have a sheet of paper to write on... If we're talking about the question as I remember it, I had deleted the comment because I might have found the answer. I'll probably create a Q&A on the site, I'll share the link here.
deb avatar
cn flag
deb
This is the question I made. Compared to the answer you gave here, the solution was fairly simple; were we talking about the same problem? https://crypto.stackexchange.com/q/97803/88017
deb avatar
cn flag
deb
Okay: One month later, here I am, finally understanding your answer. I can only appreciate the cleverness of that answer, thank you so much.
Score:2
es flag

Using an elliptic curve: The passport could contain a unix timestamp as a Pedersen Commitment in the form $C = bG + tH$, where $b$ is a scalar blinding factor, $t$ is the passport holder's date of birth as a unix timestamp (seconds since the epoch), and $G$ and $H$ are well known base points chosen such that their discrete log with respect to one another is unknown and unknowable.

$C$ is signed by the government. The passport makes the timestamp and blinding factor known to the passport holder.

To prove the passport holder is older than 18, we first compute $s$ as the latest possible timestamp the passport holder can have been born on to currently be 18 or over.

The passport holder needs to demonstrate that $s>=t$. We can do that by proving that $C' = sH - C == (s-t)H - bG$ is a commitment to a positive number. Because of the modular arithmetic nature of EC algebra, this means we need to ensure that the commitment is to a relatively small number, and not to a huge number that would result from $s-t$ being "negative".

To achieve this, we need something called a range proof.

The range proof will prove that $C'$ can be constructed by a sequence of 43 bits (which will allow the scheme to work for the next 200 years). Each 'bit' will be either zero or a power of 2.

First, we will declare 43 Pedersen commitments $C_i$ for values of $i$ between 0 and 42 inclusive, where each commitment will have its own blinding factor $b_i$ and each commitment will be either to the value zero or $2^i$.

Now, we need to prove that $C'$ is a commitment to the same number as $\sum_{i=0}^{42} C_i$. This can be done by calculating the public key $C'' = C' - \sum_{i=0}^{42} C_i$. Since the number committed to by $C'$ will equal the sum of numbers committed to by the 43 commitments, the result will be of the form $C'' == b'G$ where $b' == -b - \sum_{i=0}^{42} b_i$. Therefore providing a signature on $C''$ using the private key $b'$ will prove that there cannot be a non-zero $H$ component of $C''$, because a signature using the base point $G$ is not possible unless the discrete log of $H$ w.r.t. $G$ is known (and $H$ has specifically been chosen such that it is unknowable).

Finally, we need to prove that each of the components $C_i$ are either commitments to zero or commitments to $2^i$. We do this by providing a ring signature for each commitment for the two public keys $(C_i - 0H, C_i - 2^iH)$. A ring signature will only be possible if one of those two possibilities produces an EC point with no $H$ component, thus proving that $C_i$ is a commitment to either zero or $2^i$.

In summary, we have proven that $s>=t$ because we've proven that $sH -C$ is a commitment to a positive number. We proved it was a positive number by proving it can be built up as a series of 43 bits, where each bit is either zero or a specified power of 2.

deb avatar
cn flag
deb
I'm afraid I don't know enough about elliptic curves/"Pedersen Commitments" to fully understand your answer, but I sure do appreciate the effort. What's a scalar binding factor?
knaccc avatar
es flag
@deb It's just a random number that obscures the value of the Pedersen Commitment. If the commitment was just $tH$, you could brute force lots of values of the timestamp in order to discover the holder's date of birth. The blinding factor makes the commitment look totally random and prevents it from being brute forced. What I've described is how amounts are hidden in cryptocurrencies such as Monero, where range proofs are used to ensure someone doesn't try to create an amount on the blockchain with a negative value.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.