Score:0

When trying to break a PBKDF2 SHA512 hash, how fast is an RTX 4090 or similar GPU with the given parameters

io flag

I'm writing a paper for uni about password security, specifically about cracking passwords in the context of a password manager. I've coded a password encryption scheme which uses PBKDF2(SHA512) to hash a master password into a key that AES256 uses to encrypt a password database or vault.

I'm trying to estimate the time to crack a password with this encryption scheme via a brute force attack guessing passwords until the right password is found. Assume the salt is known to the attacker and there is no pepper. As part of my calculation I need the PBKDF2 hashrate for a top of the line GPU like the RTX 4090 with the following paramaters:

PBKDF2:

-Hashing algorithm: SHA512
-Iteration count: 600.000
-Output length: 32 bytes -Salt length : 12 bytes

I'm having trouble finding the hashrate in research papers and don't have an RTX 4090 to benchmark it myself. If anyone knows where I can get this information that would be much appreciated :).

fgrieu avatar
ng flag
Googling "Hashcat benchmark RTX 4090" returned [this](https://hashcat.net/forum/thread-11277.html) where these is one entry with PBKDF2-SHA512, albeit with a stated 1023 iteration count. Increasing the iteration would decrease rate at most (and perhaps about) linearly. Your Mileage May Vary a lot according to the quality of the implementation, and how the AES part of the test of a candidate password is handled.
Maarten Bodewes avatar
in flag
@fgrieu Not sure about that part on AES. Generally you can have a pretty good guess if the key is correct by decrypting a few blocks, and then you can decrypt more or the entire package if the key is likely correct. For e.g. AES-GCM you'd just decrypt with AES-CTR of course.
Luka Gecko avatar
io flag
I've timed the AES(GCM) verification step and I'm treating it as negligible, and the implementation is Microsoft's .NET 7 System.Security.Cryptography.dll. Could you be more specific on how increasing the iteration count slows down the hashrate linearly.
Score:1
ng flag

From this source I extract

Short Benchmark for the RTX 4090
CUDA API (CUDA 11.8)
* Device #1: NVIDIA GeForce RTX 4090, 23867/24252 MB, 128MCU

Benchmark relevant options:
* --optimized-kernel-enable
* --workload-profile=4

* Hash-Mode 1700 (SHA2-512)
Speed.#1.........:  7425.2 MH/s (288.57ms) @ Accel:64 Loops:1024 Thr:256 Vec:1

* Hash-Mode 7100 (macOS v10.8+ (PBKDF2-SHA512)) [Iterations: 1023]
Speed.#1.........:  2825.7 kH/s (221.31ms) @ Accel:32 Loops:511 Thr:512 Vec:1

The vastly different order of magnitude shows that in 7425.2 MH/sand 2825.7 kH/s, the H stands for many more SHA-512 in the second case.

By definition of PBKDF2, the iteration count is the number of iterations of a Pseudo Random Function that in practice is HMAC with the specified hash, here SHA-512, which (for the sizes considered) uses two SHA-512 round functions. Hence one PBKDF2-SHA512 with $1023$ iterations is expected to perform $2046$ SHA-512 round functions, when one SHA2-512 probably makes $1$. This is roughly consistent with the $2627$ times lower H/s value reported for the second data point.

In the computation of PBKDF2 with more than a few rounds, assuming ideal parallelization, the average time should be of the form $(u\,c+v)n$ where $c$ is the iteration count, $n$ is the (assumed large) number of PBKDF2 evaluations, and $u$, $v$ are some positive constants in seconds. Roughly, $u$ is an overhead per PBKDF2 evaluation, and $v$ is an extra time per PRF evaluation, all assuming ideal parallelization. It follows that there are $1/(u\,c+v)$ PBKDF2 evaluations per second.

We can't tell $u$ or $v$ exactly, but knowing that they are positive is enough to extrapolate a minimum PBKDF2 rate for $c=600000$ iterations from the one for $c=1023$: that should be at least $2825700\times1023/600000=4817$ PBKDF2 per second. In practice, I believe this won't be far off, because hashcat is a reputable program, thus well optimized, thus hopefully, even for $c=1023$, $u\,c\gg v$.

I've timed the AES(GCM) verification step and I'm treating it as negligible, and the implementation is (some library on the PC's CPU)

I trust that when PBKDF2 is on the PC's CPU, the AES-GCM verification step uses a negligible fraction of the CPU in the password cracking effort, assuming the (unspecified) amount of data entering AES-GCM is moderate. However, that might not stand when the PBKDF2 part is GPU-accelerated.

  • If we use the same AES-GCM code, will it be able to perform the (vastly higher) $\approx4817$ AES-GCM tests per second required so that it's not the bottleneck? Also, I can't tell for sure how much having to export the results of PBKDF2 to the PC's memory (rather than checking everything in the GPU as I imagine the benchmarked code does) will slow down the GPU code, and if making the necessary change requires a day or a month of experience writing GPU code.
  • If we move the AES-GCM code to the GPU, that's a significant change in the GPU code, and I can't forecast how efficient that will be, or even if that will be faster than the previous option.
Luka Gecko avatar
io flag
Excellently argued and presented, thank you very much. You make a very good point regarding exporting PBKDF2's results to memory being a bottleneck. If I get the time I'll run some tests with this in mind. One love :)
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.