In BLAKE2X paper it is said:
BLAKE2X adds a constant overhead of $\lceil\ell/64\rceil$ (resp. $\lceil\ell/32\rceil$ compression function calls compared to the underlying 64-bit (resp. 32-bit) BLAKE2 hash. For example, to compute a 1056-bit (132-byte) hash as required in Ed521 signatures, BLAKE2X adds† $\lceil132/64\rceil=3$ extra compression function calls compared to BLAKE2b. Note that $\operatorname{B2}(i,j,H_0)$ calls can be computed in arbitrary order, and in parallel.
Is a compression function call the same as a invoking of the hash function itself?
In BLAKE2X there is also this notation:
$$\operatorname{B2}(0,64,H_0)\mathbin\|\operatorname{B2}(1,64,H_0)\mathbin\|\ldots\mathbin\|\operatorname{B2}(\lfloor\ell/64\rfloor,\ell\bmod64,H_0)$$
Are these concatenated values successive calls to the hash function itself?
This question can sound obvious, but I am not a very good English reader and I am little confused about timing when generating a 1GiB stream with BLAKE2X using a 64-byte seed compared to using a 40KiB one:
$ time dd if=/dev/zero count=1 bs=64 2>/dev/null | /usr/local/bin/b2sum -X 8589934592 > /dev/null
real 0m7.058s
user 0m6.273s
sys 0m0.785s
$ time dd if=/dev/zero count=1 bs=1073741824 2>/dev/null | /usr/local/bin/b2sum -X 8589934592 > /dev/null
real 0m9.295s
user 0m7.669s
sys 0m2.018s
/\ Too much fast and good for me. Maybe the seed is stored in L1/L2 cache (that should have a speed of 1TiB/s)
† with an obvious fix from the original $\lceil132/3\rceil=3$