This function from Wikipedia calculates the sbox used in AES:
#include <stdint.h>
#define ROTL8(x,shift) ((uint8_t) ((x) << (shift)) | ((x) >> (8 - (shift))))
void initialize_aes_sbox(uint8_t sbox[256]) {
uint8_t p = 1, q = 1;
/* loop invariant: p * q == 1 in the Galois field */
do {
/* multiply p by 3 */
p = p ^ (p << 1) ^ (p & 0x80 ? 0x1B : 0);
/* divide q by 3 (equals multiplication by 0xf6) */
q ^= q << 1;
q ^= q << 2;
q ^= q << 4;
q ^= q & 0x80 ? 0x09 : 0;
/* compute the affine transformation */
uint8_t xformed = q ^ ROTL8(q, 1) ^ ROTL8(q, 2) ^ ROTL8(q, 3) ^ ROTL8(q, 4);
sbox[p] = xformed ^ 0x63;
} while (p != 1);
/* 0 is a special case since it has no inverse */
sbox[0] = 0x63;
}
It contains subtle details that I struggled for some time to comprehend. The thing that confused me most was the division by 3. In the Galois field 246 multiplied by 3 is 1, so 246 is the inverse of 3, and multiplying by 246 is equivalent to dividing by 3. This much was clear to me, but I couldn't make the logical leap from that fact to these lines:
q ^= q << 1;
q ^= q << 2;
q ^= q << 4;
q ^= q & 0x80 ? 0x09 : 0;
I performed the multiplication in the way described in the FIPS.197 document (adding the original value multiplied by exponents of 2). It's verbose, so I came up with a system. I name the bits of the multiplicand a, b, c, d, e, f, g, and h. You can probably tell what I'm doing. I've reduce with the polynomial like you're supposed to. To arrive at 246 we need to add the exponents 128, 64, 32, 16, 4, and 2:
2 .b...... ..c..... ...d.... a...e... a....f.. ......g. a......h a.......
4 ..c..... ...d.... a...e... ab...f.. .b....g. a......h ab...... .b......
16 a...e... ab...f.. .bc...g. a.cd...h ab.d.... .bc..... ..cd.... ...d....
32 ab...f.. .bc...g. a.cd...h .b.de... abc.e... ..cd.... a..de... a...e...
64 .bc...g. a.cd...h .b.de... ..c.ef.. abcd.f.. a..de... .b..ef.. ab...f..
128 a.cd...h .b.de... ..c.ef.. a..d.fg. abcde.g. .b..ef.. a.c..fg. .bc...g.
xor'ing all letters in each column yields one resulting bit. Here's the result:
a^b^c^d^e^f^g^h, b^c^d^e^f^g^h, c^d^e^f^g^h, d^e^f^g^h, a^b^c^d, f^g^h, g^h, a^b^c^d^e^f^g.
Returning to the wikipedia algorithm; it seems like a multiplication by 255 - except instead of reducing with the polynomial for each step it's followed by a conditional xor with 9.
The first three lines evaluate to:
a^b^c^d^e^f^g^h, b^c^d^e^f^g^h, c^d^e^f^g^h, d^e^f^g^h, e^f^g^h, f^g^h, g^h, h
The last line effectively xors bit 7 (a^b^c^d^e^f^g^h) on bits 3 (e^f^g^h) and 0 (a). In bit 3 e, f, g, and h cancel out. In bit 0 h cancels out. This is the end result:
a^b^c^d^e^f^g^h, b^c^d^e^f^g^h, c^d^e^f^g^h, d^e^f^g^h, a^b^c^d, f^g^h, g^h, a^b^c^d^e^f^g.
Fortunately it yields the same result, but how do you arrive at this algorithm?