Score:1

# How fast is GHASH and what does it do?

https://www.researchgate.net/publication/220335697_GCM_GHASH_and_Weak_Keys

how GHASH works. So we have $$m$$ 128-bit blocks $$X_{i}$$ and we compute in $$GF(2^{128})$$:

$$Y_{m} = \sum_{i=1}^{m} X_{i} \times H^{m-i+1}$$

$$H$$ is a key. Am I see right there that $$H$$ is raised to the power? Is there method to do it fast in $$GF(2^{128})$$ or it is just standard exponentiation modulo with fast exponentiation? I thought there is one multiplication in $$GF(2^{128})$$ on one 128-bit block, but if $$H$$ is raised to the power, there is much more.

How fast is GHASH itself compared to AES? I mean if AES can achieve $$n$$ cycles per byte, how fast is GHASH alone? Is perfmormance of GHASH comparable just to one multiplication in $$GF(2^{128})$$ on every 128 bits of plaintext or is it much complicated?

How fast it is depends on hardware features. It can be really fast if the CPU has [pclmulqdq](https://www.felixcloutier.com/x86/pclmulqdq).
@forest my question was about GHASH with pclmulqdq, becuase today it seem to be standard implementation of GCM. I know it can be fast, but let's say AES can achevie 1 cycle per byte, what GHASH takes in this 1 cycle per byte?
https://crypto.stackexchange.com/a/60109/54184 shows benchmarks with and without various hardware acceleration features for both GHASH and AES in CTR mode. AES with full hardware acceleration achieves 5307.37 MB/s. GHASH with full hardware acceleration achieves 4795.76 MB/s.
You evaluate the polynomial using [Horner's method](https://en.wikipedia.org/wiki/Horner%27s_method) -- one multiplication and one addition for each coefficient of the polynomial.
@forest thanks, there are some interesting comparisons.
By the way am I right that if AES and GHASH are comparable in perfomance, then if we combined them in GCM mode, all system (AES-GCM) works with half of the speed that sigle of them can achieve?
Note that modern CPU instructions for GHASH, too.
@Tom Not necessarily or even likely. This could indicate that the amount of CPU cycles is comparable, but on modern CPU's a lot of time is spent during I/O - loading data in the various levels of cache and CPU registers.
@Tom Modern CPUs have very deep pipelines and can effectively execute multiple instructions at once as long as they don't both require using the same execution units (i.e. they don't cause port conflicts) and as long as neither instruction depends on the other having finished first. AES-NI and CLMUL both run on different execution units (port 0 vs port 5 respectively on Skylake, see Table 11.1 in [Agner Fog's CPU microarchitecture optimization document](https://www.agner.org/optimize/microarchitecture.pdf)), so they'll be executed concurrently.
For reference, on an old Ivy Bridge server CPU with support for both AES-NI and CLMUL, 128-bit AES in CTR mode is 3205 MB/s, GCM mode is 935.8 MB/s, and raw GHASH is 1328 MB/s (all on 16 KiB blocks, tested with OpenSSL 1.1.1n). You can see that it must be executing concurrently.
I sit in a Tesla and translated this thread with Ai: