Score:1

SUPERCOP benchmark of signature scheme: Number of cycles of key generation

kg flag

I'm having troubles in interpreting the output of the SUPERCOP benchmarks of some digital signature schemes.

Precisely, I don't understand how to read the number of cycles for the key generation. This should be found in the line of the SUPERCOP output file with the keyword "keypair_cycles". However, I find three such lines, with significantly different numbers. Below the three lines from the benchmark of Dilithium3:

SHELL:

[...]$ ./do-part crypto_sign dilithium3
[...] starting
[...] crypto_sign/dilithium3
[...] crypto_sign/dilithium3/avx2 constbranchindex
[...] crypto_sign/dilithium3/avx2 constbranchindex gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE
[...] crypto_sign/dilithium3/ref constbranchindex
[...] crypto_sign/dilithium3/ref constbranchindex gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv -fPIC -fPIE
[...] crypto_sign/dilithium3 constbranchindex measuring
[...] finishing
[...] database size for this run:   1218  28630 185871 [...]

SUPERCOP output file

[my machine] crypto_sign dilithium3/constbranchindex keypair_cycles - 94034 134909 95486 95039 94335 94184 93892 93937 93971 93613 93677 94135 93846 93322 94034 94134

[my machine] crypto_sign dilithium3/constbranchindex keypair_cycles - 81618 84214 82419 81657 82054 81119 81618 81807 81616 81176 82033 81842 81496 81125 81332 81618

[my machine] crypto_sign dilithium3/constbranchindex keypair_cycles - 81553 84202 82084 82365 81595 81497 81587 81576 81091 81283 81513 81322 81316 81210 82077 81553

As you can see, in the first line, the median time for key generation is about 15% greater than in the other two lines. Why three lines instead of just one? Why are the numbers different?

Thanks for any help!

swineone avatar
ru flag
Assuming these refer to multiple runs of the same scheme under the same conditions (e.g. seed), then it's possible that the first run is just a warmup run: load instructions and data into cache, etc. If the output of the first run includes the very first time the CPU has seen the code, it's quite common for there to be cache misses and possibly even page faults, so I wouldn't be surprised that it runs slower.
rouguex avatar
kg flag
@swineone The strange thing is that for other schemes the higher numbers are in the second or even in the third line, which does not seem consistent with the "warmup" explanation.
swineone avatar
ru flag
The other usual caveats about benchmarking apply: have you disabled Turbo Boost? What is your scaling governor? Is your machine properly cooled? Is hyper-threading disabled? Have you disabled any daemons or services that may compete for CPU with the benchmark? Sorry if you’re already familiar with these issues and took all the necessary precautions beforehand.
rouguex avatar
kg flag
@swineone I re-run SUPERCOP disabling turbo boost, but the results are similar. Also, I'm not doing any other particular operation while running SUPERCOP. However, I'm not sure if CPU frequency (turbo boost) or other concurrent processes really affect SUPERCOP. In fact, SUPERCOP measures cycles (not time) of the test cryptographic primitive.
swineone avatar
ru flag
You're right, by measuring cycles most frequency-related effects can be disregarded (although memory accesses, whose clock is independent of the CPU, may vary in latency). As for processes, if there is a context switch in the middle of a measurement, it will definitely affect results -- how likely that is depends on how long the benchmark runs for. You may experiment with pinning processes to a single CPU using e.g. `taskset`, and also use PMC counters to monitor e.g. cache misses.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.