There something misleading in Wikipedia article. They wrote:
$s = b \oplus (b \lll 1) \oplus (b \lll 2) \oplus (b \lll 3) \oplus (b \lll 4)$
Is equal to:
$s = b \times 31 \mod 257$
where $\times$ is polynomial multiplication of $b$ and $31$ taken as bit arrays. But $\mod 257$ is not standard modulo operation, it is mod in $GF(2^8)$ and $257$ is in fact polynomial of the form $x^{9}+1$.
We can easily see that carry-less multiplication of $b \times 31$ is equal to:
$s = b \oplus (b \ll 1) \oplus (b \ll 2) \oplus (b \ll 3) \oplus (b \ll 4)$
in bits $7,6,5,4$, where $\ll$ is bitwise shift, not circualr sfift. But in Wikipedia example and in AES there is circular shift. So where does circular come from? It comes from reduction by the polynomial $x^9+1$. This is how multiplication with polynomial reduction works in $GF(2^8)$:
#include <cstdint>
#include <cstdio>
#include <bitset>
#include<iostream>
uint8_t multiply(uint a, uint b)
{
uint8_t p = 1;
uint16_t L = 1;
uint16_t c = 0;
for (unsigned int i = 0; i < 8; ++i)
{
c ^= a & (L << i) ? (uint16_t)b << i : 0;
}
for (unsigned int i = 8; i-- > 0; )
{
if (c & (L << (i + 8)))
{
c ^= L << (i + 8);
c ^= (uint16_t)p << i;
}
}
return (uint8_t)c;
}
int main()
{
uint result = 0;
result = multiply(131,31);
std::cout << result;
return 0;
}
We need polynomial of degree $9$, but we can define reducing polynomial in practical implementation by taking only its first $8$ least significant bits, so in our case $p=1$. Now if we look how carry-less multiplication works (first loop), we can see that in $7,6,5,4$ bits we would get the same result, as with shifts:
$s = b \oplus (b \ll 1) \oplus (b \ll 2) \oplus (b \ll 3) \oplus (b \ll 4)$
But in place of bits $11,10,9,8$ will be exactly bits shifted (canceled) out in process of bitwise shift. This is because carry-less multiplication move them to the left by $1,2,3...$ positions and then all is xored. Like here:
Since we are multiplicating by $31$ we would get always $4$ extra bits in higher significant half of our 16-bit result. And they would be xored, because this is how carry-less multiplication works. So now we got bits $7,6,5,4$ equal to the result of:
$s = b \oplus (b \ll 1) \oplus (b \ll 2) \oplus (b \ll 3) \oplus (b \ll 4)$
And bits $11,10,9,8$, which could be in place of bits $3,2,1,0$. To make circular shift we need to only xor back this bits. This is done by:
for (unsigned int i = 8; i-- > 0; )
{
if (c & (L << (i + 8)))
{
c ^= L << (i + 8);
c ^= (uint16_t)p << i;
}
}
If $p=1$. We can see that this code is really checking is there bit equal to $1$ on the left on position $i$ of our top half, and if it is, then it xor that bit with bit $i$ on our low half. And all that algorithm is equal to xoring with circialr shift $\lll$:
$s = b \oplus (b \lll 1) \oplus (b \lll 2) \oplus (b \lll 3) \oplus (b \lll 4)$
Knowing this it becomes clear that this kind of equivalence holds for any $GF(2^k)$, provided we choose $p=1$ (by the convention of the posted code) as our polynomial and multiplier equal to $2^{\frac {k}{2}+1}+1$.