Score:1

How fast is GHASH and what does it do?

tf flag
Tom

I read here:

https://www.researchgate.net/publication/220335697_GCM_GHASH_and_Weak_Keys

how GHASH works. So we have $m$ 128-bit blocks $X_{i}$ and we compute in $GF(2^{128})$:

$Y_{m} = \sum_{i=1}^{m} X_{i} \times H^{m-i+1}$

$H$ is a key. Am I see right there that $H$ is raised to the power? Is there method to do it fast in $GF(2^{128})$ or it is just standard exponentiation modulo with fast exponentiation? I thought there is one multiplication in $GF(2^{128})$ on one 128-bit block, but if $H$ is raised to the power, there is much more.

How fast is GHASH itself compared to AES? I mean if AES can achieve $n$ cycles per byte, how fast is GHASH alone? Is perfmormance of GHASH comparable just to one multiplication in $GF(2^{128})$ on every 128 bits of plaintext or is it much complicated?

forest avatar
vn flag
How fast it is depends on hardware features. It can be really fast if the CPU has [`pclmulqdq`](https://www.felixcloutier.com/x86/pclmulqdq).
Tom avatar
tf flag
Tom
@forest my question was about GHASH with pclmulqdq, becuase today it seem to be standard implementation of GCM. I know it can be fast, but let's say AES can achevie 1 cycle per byte, what GHASH takes in this 1 cycle per byte?
forest avatar
vn flag
https://crypto.stackexchange.com/a/60109/54184 shows benchmarks with and without various hardware acceleration features for both GHASH and AES in CTR mode. AES with full hardware acceleration achieves 5307.37 MB/s. GHASH with full hardware acceleration achieves 4795.76 MB/s.
us flag
You evaluate the polynomial using [Horner's method](https://en.wikipedia.org/wiki/Horner%27s_method) -- one multiplication and one addition for each coefficient of the polynomial.
Tom avatar
tf flag
Tom
@forest thanks, there are some interesting comparisons.
Tom avatar
tf flag
Tom
By the way am I right that if AES and GHASH are comparable in perfomance, then if we combined them in GCM mode, all system (AES-GCM) works with half of the speed that sigle of them can achieve?
kelalaka avatar
in flag
Note that modern CPU instructions for GHASH, too.
Maarten Bodewes avatar
in flag
@Tom Not necessarily or even likely. This could indicate that the amount of CPU cycles is comparable, but on modern CPU's a lot of time is spent during I/O - loading data in the various levels of cache and CPU registers.
forest avatar
vn flag
@Tom Modern CPUs have very deep pipelines and can effectively execute multiple instructions at once as long as they don't both require using the same execution units (i.e. they don't cause port conflicts) and as long as neither instruction depends on the other having finished first. AES-NI and CLMUL both run on different execution units (port 0 vs port 5 respectively on Skylake, see Table 11.1 in [Agner Fog's CPU microarchitecture optimization document](https://www.agner.org/optimize/microarchitecture.pdf)), so they'll be executed concurrently.
forest avatar
vn flag
For reference, on an old Ivy Bridge server CPU with support for both AES-NI and CLMUL, 128-bit AES in CTR mode is 3205 MB/s, GCM mode is 935.8 MB/s, and raw GHASH is 1328 MB/s (all on 16 KiB blocks, tested with OpenSSL 1.1.1n). You can see that it must be executing concurrently.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.