Score:5

argon2, is there a security cost to raising the parallelism too high?

cn flag

I'm experimenting with the parameters for argon2, using argon2_cffi.

Whereas the iteration count or time_cost, and the memory_cost have obvious bearings on the speed and security of the result. I've not seen any guidance on a maximum for the parallelism parameter, other than enough for all the threads you have.

I have a 4-core i5, not sure if that counts as 4 or 8 threads. I am using time_cost=4, memory_cost=2**20 kiB, data dependent, with 'password' and 'some salt', and I get the following rough timings

parallelism   time seconds
   1            3.93
   2            2.14
   4            1.30
   8            1.29
  16            1.28
  32            1.23
  64            1.23
 128            1.26
 256            1.31
 512            1.34

Once I'm using all my cores, I hit a time floor, as expected. I was sort of expecting a significant slowdown with large numbers for parallelism, and there is a hint of it running very slightly slower with silly numbers.

I don't know how the algorithm uses memory, but I might imagine that if it is breaking the computation up into many disjoint blocks to run in parallel, then each block will use less memory and/or will execute fewer iterations. As the published attacks on it seem to concentrate on examples with few iterations, I could well imagine that it's stronger to use many iterations on a single block, than fewer on many blocks that are then combined at the end.

The question is, do large numbers for parallelism hurt security, and if so, how badly, maybe in the way I've surmised? There doesn't appear to be any significant cost in speed from over-specifying parallelism.

If I'm to target a wide range of hardware, do I set parallelism high, to benefit from all the speed advantage of the many core machine? Is there a threshhold, below which it makes no difference to security?

Score:2
in flag

TL;DR: The memory size is independent on the number of threads.

The answer can readily be found in the Argon2 specification in the paper "Argon2: the memory-hard function for password hashing and other applications" as input to the Password Hashing Competition (PHC).

From Specifications of Argon2, 3.1 Inputs:

Number of iterations $t$ (used to tune the running time independently of the memory size) can be any integer number from $1$ to $2^{32} − 1$.

And from 3.2 Operational:

For tunable parallelism with $p$ threads, the memory is organized in a matrix $B[i][j]$ of blocks with $p$ rows (lanes) and $q = m' / p$ columns.

From section 4.1 Available features:

Scalability. Argon2 is scalable both in time and memory dimensions. Both parameters can be changed independently provided that a certain amount of time is always needed to fill the memory.

Parallelism. Argon2 may use up to 224 threads in parallel, although in our experiments 8 threads already exhaust the available bandwidth and computing power of the machine.

(emphasis mine)


Beware that the memory and number of threads are allocated & used for each call to Argon2. That means that if you've got many user threads that the usage of memory is linear to the amount of users that are logging in simultaneously. As Argon2 is deliberately designed to take time, you should be worried about the total amount of memory and threads used.

Specifying too many threads may actually become a worry as thread switching is definitely not free of performance penalties, even if the amount of threads can be allocated by the system. As multiple calls to Argon2 can run in parallel as well, allocating more threads should be used to decrease latency to a comfortable level rather than to "increase performance" of a server.

You seem to have tested only one call to Argon2; calling the function while other functions are running may show different values, especially since memory hard functions are obviously very prone to the speed of memory access and, for parallelism, the availability of threads and cores.


As for security: no the parallelism should not make a large difference with regard to security. The reason is simple: password hashes are assumed to be run in parallel when performing a password search by an adversary. Adding threads only changes the latency of a single password hash, while the CPU time taken (over all threads) remains approximately the same.

The CPU-time difference you see with regards to multi-threads is likely because of the slower memory access across CPU cores as well as overhead.

Maarten Bodewes avatar
in flag
Dangit, now I'm wondering how much the 3D cache in the AMD processors will make a difference. Depends of course on the amount of memory specified and the number of Argon2 invocations.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.