Score:28

Is there a hash function that's more expensive for an attacker than for the server?

cx flag

Say a server wants to hash a password $p$. It would use a secure hash function $H$ and a unique salt $s$ to hash the password as $H(p,s)$. If one has access to the salt, each password candidate requires one run of the hash function to be ruled out; the same amount of time it would take for the server to verify a password candidate.

If, on the other hand, the password was hashed as $H'(p,s+r)$, where $r$ is a random integer in the range $[0,n]$ and that is not stored anywhere. Verifying a correct password would require on average $n/2$ runs of $H'$, whereas ruling out an incorrect password candidate would require $n$ runs of $H'$, i.e. twice as slow for the attacker than for the server.

I'm thinking that this means that if $H'$ is $n$ times faster than $H$, this yields up to twice as fast password validation for the server but equally slow password testing for an attacker as before. And one could then use this either to make the sign-in quicker for users or the hash function slower for an attacker.

Are there any obvious issues with this approach or ways to improve it?

Edit:
The idea here is for there to be many different options for what is stored in the database and that more of these options need to be tried for an attacker than for the server. This could also be implemented in ways that don't use it as a salt. E.g. password hash $=H_r(p,s)$, where $H_r$ denotes the number of iterations of the hash, i.e. $r=3\rightarrow H_r=H'(H'(H'(p,s)))$.

An attacker would need to calculate $n$ hashes for each password but the server only needs to calculate $r$ hashes to validate a correct password (and $n$ hashes to reject an incorrect password).

DannyNiu avatar
vu flag
The concept of "non-public/secret salt" is known as "pepper".
n-l-i avatar
cx flag
Yes so I'm talking about a pepper not known to the server either. Say the pepper can be any number between "0" and "100" then, for each potential password, the server or the attacker needs to try all of these options. If a match is found then they can stop.
Geoffroy Couteau avatar
cn flag
Yes, this is exactly the concept known as peppering :)
Geoffroy Couteau avatar
cn flag
Re-posting my link from another comment because it contains useful pointers: see e.g. [here](https://arxiv.org/pdf/2206.12970.pdf).
n-l-i avatar
cx flag
The paper you've linked is very interesting and answers my question well. This is exactly what I wanted to find. It seems that "pepper" is a fairly [broad term](https://crypto.stackexchange.com/a/20583) and I've been familiarised with a definition that suggests that the pepper is a second salt but stored elsewhere. Both types of pepper seem to have big but very different types of drawbacks which is interesting.
jp flag
If you would demand the client to submit both the password and $r$ it could be $n$ times faster for the server than for the attacker. The attacker/client would have to know a rule in advance, e.g. hash must start with 16 0 bits. Or just decouple it from the password hash entirely, and require the attacker/client to submit a proof-of-work with every login attempt.
Geoffroy Couteau avatar
cn flag
@n-l-i indeed, actually I was mostly familiar with "pepper" in the restricted sense of what you were considering, and discovered upon checking the Wikipedia page that it was also used in a much broader sense than I expected (hence my initially overconfident messages where I said "yes this is exactly peppering" - I understand now that there's also a much broader use of the term)
us flag
When I saw the title in "Hot Network Questions", I came here to say, "Of course not! That's silly!" But it turns out I was dead wrong. Great question and TIL something.
DarthGizka avatar
kr flag
A gain of 50% for the server is **nothing**. The server usually runs a mediocre, overly generic implementation on mediocre hardware (that's also loaded with oodles of other tasks) whereas an attacker will run a top notch implementation on top hardware. That alone gives the attacker an advantage of about an order of magnitude, before you even consider purpose-built optimised implementations stripped of any superfluous operation and fine-tuned to the available silicon at the machine code level (e.g. SIMD) or that run on GPUs or on a botnet. Then there's also purpose-built silicon to consider...
Score:24
ng flag

That's is an interesting idea (that was new to me) and turns out to be known as (random) peppering, as pointed in these comments.

Indeed, the average number of evaluations of $H'$ by the server when testing the correct password (which is usually the most common, except for Denial of Service attack) is reduced by $\frac{n-1}{2n}$ compared to work for an attacker that proceeds by eliminating passwords one by one. That gain converges quickly to $\frac12$ when $n$ grows. And while attackers are not bound to that test strategy†, it is typically the less costly for them, because moving from one password tested to the next typically has a large cost.

A gain of at most 50% is not game-changing. But implementers of password hashes like Argon2 optimize their code carefully (because that makes it possible to use more iterations at a given cost, thus ultimately makes using their code safer), and even a 20% gain should be something they consider notable.

If this is in use in some modern login system(s), I want link(s) to see how the potential issue I note next is addressed. And if this is not used, I wonder why because once you see this relatively easy and sizable gain, it looks like it should be a standard technique.


The one major caveat I see is that we should fear that an adversary finds (or approximates) $r$ by a side channel attack on the genuine server while a genuine user logs in (the simplest such attack being timing the login; since $H'$ is slow, that's relatively easy). With the exact $r$, the attacker's effort is divided by a factor of $n$, which is bad.

For this reason, it's probably best not to make $n$ too big, so that even a leak of $r$ is not a major disaster; and that the genuine server, when verifying a password, tries the $n$ possible values of $r$ in some random order. With small prime $n$ (e.g. $n=7$ saving on average >42% of the work for correct password)

  • draw ephemeral secret $r$ random in $[0,n)$
  • set $\delta=1$, or better‡ draw ephemeral secret $\delta$ in $[1,n)$
  • $r_0\gets r$
  • repeat
    • test if $H'(p,s+r)$ verifies, if so accept password $p$.

    • $r\gets r-\delta$

    • if $r<0$ then $r\gets r+n$, in constant time

      In modern C, if r is of type int32_t, we can use r += (int32_t)((0-((uint32_t)r>>31))&(uint32_t)n)

  • until $r=r_0$
  • reject password $p$.

† For example, an adversary having captured so many login/salt/hashes that several of them are likely to be for password 123456, satisfied by finding one, and able to move from one login/salt/hash to the next at negligible cost, would not be impaired by the proposed change if they tested values of $r$ incrementally for all the login/salt/hashes they have, and the fixed password 123456.

‡ From the perspective of timing attacks, using fixed $\delta=1$ is fine. Random secret $\delta$ is only intended as an extra protection against other side-channel attacks. We make $n$ prime so that any $\delta\in[1,n)$ is coprime with $n$.

Geoffroy Couteau avatar
cn flag
Note that this concept of peppering is actually well known. I don't know how much it's used in practice, but it has been discussed in many papers, and is essentially identical to OP's idea. There's even a Wikipedia entry: https://en.wikipedia.org/wiki/Pepper_(cryptography)
Geoffroy Couteau avatar
cn flag
Oh actually this entry is a bit too general. But "peppering" is used in the security community to refer precisely to OP's idea. Also, there are apparently some complications when one tries to use peppering with memory-hard functions such as Argon: see e.g. [here](https://arxiv.org/pdf/2206.12970.pdf) for a recent work on the subjet.
fgrieu avatar
ng flag
@Geoffrey Couteau: I knew fixed secret pepper, not _"randomly-selected number that must be re-discovered on every password input"_ (per question) as this wikipedia entry puts it, [with details not so long ago](https://en.wikipedia.org/w/index.php?title=Pepper_(cryptography)&oldid=1124574311#Randomly_selected_pepper). And at first glance at least I do not see this idea in the [reference quoted for that](https://www.usenix.org/legacy/events/sec99/full_papers/kedem/kedem.pdf). Undoubtedly, it's in [your reference](https://arxiv.org/pdf/2206.12970.pdf#page=3).
fgrieu avatar
ng flag
@Geoffrey Couteau: your reference cites [this](https://doi.org/10.1016/0167-4048(96)00003-X) 1996 source as the origin of (random) peppering. That does as in the question, notes that the work factor for the attacker grows with $S$ (the question's $n$), and makes $S$ 100 to 1000. But it does not tell that the workfactor is increased by only $\frac{S+1}{2S}$ for legitimate use with correct password, which is the interesting point in the question, as it remains relevant in an era with a workfactor parameter to password hashes.
quarague avatar
ke flag
I might have misunderstood some part of your caveat but wouldn't it suffice if the server checks the candidates for r in a random order? If the server checks them in order from 1 to n than a timing attack can help you find r but if the server first randomly reorders the number from 1 to n and then checks them in that order the timing attack will not tell you anything about r.
id flag
Am I to understand that if a server checked until it found the correct `r`, and then slept for the remaining time it would take to check all `r`, that would still be a win? An attacker can't timing attack it, and while it still takes longer for a valid user to log in, it doesn't cost the server owner more power.
fgrieu avatar
ng flag
@quarague: yes, it's sufficient that the server checks the candidates for $r$ in a random order. It's not necessary that the order is one random of the $n!$ possible orders: what's necessary is that the distribution of the $i^\text{th}$ $r$ tested is uniform for each fixed $i\in[1,n]$. The answer uses $n(n-1)$ possible order because that's easy and might give extra protection against some side channels, but for timing attack we can simplify to $n$ orders, by fixing $\delta=1$.
fgrieu avatar
ng flag
@Adam Barnes: it's hard to implement "slept for the remaining time it would take to check all $r$" in a way that's guaranteed to not allow an attacker to determine how much time the server slept. For example, the adversary might be able to determine the responsiveness of a low-priority process running on the same machine, that's stopped when the logon process hashes, but runs when the logon process sleeps. My proposal is safe from that, and nearly halves the average wall clock logon time perceived by a user.
Score:5
in flag

First of all, I'll assume that you'd use a password hash (or a Password Based Key Derivation Function, PBKDF) to hash the password. Password hashes require a work factor or number of iterations to calculate the result, slowing down both the server (once, if the password is correct) and the adversary (once for each try). Let's assume that the number of iterations is $n$ in the following sections.

The approach you describe does seem to work. There are however a few drawbacks:

  • the worst case for the number of tries is still $n$;
  • a user that types the wrong password will also experience the worse case scenario.

It seems that the approach simply relies on a full search and that always takes $n \over 2$ operations on average. That means that the speed for the server compared to the adversary is only doubled. This also means that $r$ might as well simply be set to $2$ otherwise the difference between the minimum and maximum time may be too large, without providing any benefit to the user.

So yes, there is and you've described a scheme that is faster on the server. But it is limited in the advantages and it does come with some drawbacks. For such a scheme to be successful I do think that it needs to provide a higher advantage than 2 times. In cryptography this is adding a single bit to the security strength, normally a PBKDF should at least provide ~20 bits of security (i.e. over a million iterations).

Score:5
us flag

It's worth considering the attacker's goals and the threat model of users and the server.

For a given password, guessing only $k$ out of $n$ values of $r$ can create false negatives: a case where you have the right password but you don't realize it because you didn't pick the right $r$.

From a usability point of view, you want the false negative rate to be $0$: a user with the correct password should never be prevented from logging in.

From an attackers point of view, the false negative rate could be almost anything less than $1$: if I breach the password server and get the database of hashed passwords, I probably only care about compromising some number of users, not any specific users. Thus, if I only try $k$ different values of $r$, then I miss a fraction of $\frac{n-k}{n}$ passwords that I would otherwise find, but if I still get $\frac{k}{n}$, I probably don't care (whatever malicious behaviour I wanted to do will still go through).

(there are also different threat models where an attacker would also want a false negative rate of $0$, and your idea could be useful for those).

But, I disagree with the premise that this generally forces an attacker to use an extra factor of $n$ for each password guess.

the default. avatar
id flag
I don't think there is any advantage to reducing k instead of reducing the number of accounts you try -- e.g. if 10% of all users have bruteforceable passwords, if you try 50% of all users, you get 5% of all users, and if you set k to n/2 and try all users, you still get 5% of all users for the same number of hashes. Another scenario: you try all users in both cases, but set k to n/2 in the second case so the brute force is 2x faster. It's likely that you'll lose many more accounts by halving k than by trying 2x fewer passwords, since passwords are guessed in order of decreasing frequency.
Sam Jaques avatar
us flag
Password frequency is a great point. I suppose if password $p_1$ is more likely than $p_2$, then the probability that (password,pepper)$=(p_1,r)$ is always greater than the probability that it equals $(p_2,r)$.
Jonas Wilms avatar
us flag
What if instead of picking a random r each time, the server would pick a fixed subset R of [0, n) (let‘s say |R| = n/2), and then for each created password would take a random r E R. In that case, when the attacker takes a subset A of [0, n) (|A| = n/2) of rs to try, there might even be a case that no single password matches (if there is no common element in R and A). For sure as the attacker finds (password, r) combos they learn about R, so their chances of finding a password in < O(n / 2) slowly increase.
Sam Jaques avatar
us flag
@JonasWilms How does the server know $R$ when it checks a password? If the server deletes $R$ then it takes them just as much work, since they must still check all $r$ in $[0,n)$. If the server keeps $R$ secret, then (a) if the attacker compromises $R$, the server loses its advantage; (b) the server could do more effective things if it's willing to use an extra secret, like encrypting the hashes of the passwords
Jonas Wilms avatar
us flag
@SamJaques I had a real world scenario in mind, where you would have multiple servers, each server picking a random r, storing that somewhere secure in memory and then using that to create multiple passwords. They do not know R but only one r out of it, so for checking a password their advantage is only based on the fact that the password to try is likely correct, whereas the password the attacker tries is likely incorrect (the original premise of the question). I tried to construct a case where the assumption that r is randomly chosen from [0, n) is incorrect and thus the attacker gains ...
Jonas Wilms avatar
us flag
... no advantage over picking a subset of [0, n). However the more I think about this that is likely not a real advantage, as the servers would loose their advantage once the attacker finds rs from the dfiferent servers, and the probability to find that increases by the number of servers, whereas decreasing the number of servers decreases the different rs to find. So I guess in the end that was just a nonsense idea and on average is as good as picking a random r from [0, n) each time
Score:0
kz flag

A hash function is usually expensive because you are doing many hashing rounds. You could do a million rounds for letting someone in, but somehow be able to find out after 100 rounds that the password is definitely wrong - and then wait the appropriate time to report the failure, at zero cost. That could help against DOS attacks.

jp flag
What authentication scheme works this way?
gnasher729 avatar
kz flag
You could store the hash after 100 rounds. If it is wrong after 100 rounds, the attacker failed. If it is right after 100 rounds, you do another 999,900 rounds. The attacker never finds out that he was right after 100 rounds.
jp flag
Why would you do another 999,900 rounds if it's *right*? why not if it's wrong? attackers make mostly wrong guesses. but why bother doing the rounds, why not just sleep() ?
sh flag
How do you avoid an attacker to also stop after 100 rounds when brute-forcing?
Kenny Evitt avatar
br flag
Waiting "the appropriate time to report the failure" is definitely not zero cost. You'd have to maintain a network connection for one, which definitely isn't free, and also enough state, and some kind of CPU/memory to 'stop waiting' eventually. It might be *less* expensive tho.
Kenny Evitt avatar
br flag
And part of the threat analysis for something like this would assume an attacker might have access to a database of hashes and the algorithm, so those attackers wouldn't be deterred by what you're thinking of.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.