Score:4

How do you arrive at this algorithm for dividing by 3 in the Galois field?

dk flag

This function from Wikipedia calculates the sbox used in AES:

#include <stdint.h>

#define ROTL8(x,shift) ((uint8_t) ((x) << (shift)) | ((x) >> (8 - (shift))))

void initialize_aes_sbox(uint8_t sbox[256]) {
    uint8_t p = 1, q = 1;
    
    /* loop invariant: p * q == 1 in the Galois field */
    do {
        /* multiply p by 3 */
        p = p ^ (p << 1) ^ (p & 0x80 ? 0x1B : 0);

        /* divide q by 3 (equals multiplication by 0xf6) */
        q ^= q << 1;
        q ^= q << 2;
        q ^= q << 4;
        q ^= q & 0x80 ? 0x09 : 0;

        /* compute the affine transformation */
        uint8_t xformed = q ^ ROTL8(q, 1) ^ ROTL8(q, 2) ^ ROTL8(q, 3) ^ ROTL8(q, 4);

        sbox[p] = xformed ^ 0x63;
    } while (p != 1);

    /* 0 is a special case since it has no inverse */
    sbox[0] = 0x63;
}

It contains subtle details that I struggled for some time to comprehend. The thing that confused me most was the division by 3. In the Galois field 246 multiplied by 3 is 1, so 246 is the inverse of 3, and multiplying by 246 is equivalent to dividing by 3. This much was clear to me, but I couldn't make the logical leap from that fact to these lines:

        q ^= q << 1;
        q ^= q << 2;
        q ^= q << 4;
        q ^= q & 0x80 ? 0x09 : 0;

I performed the multiplication in the way described in the FIPS.197 document (adding the original value multiplied by exponents of 2). It's verbose, so I came up with a system. I name the bits of the multiplicand a, b, c, d, e, f, g, and h. You can probably tell what I'm doing. I've reduce with the polynomial like you're supposed to. To arrive at 246 we need to add the exponents 128, 64, 32, 16, 4, and 2:

  2 .b...... ..c..... ...d.... a...e... a....f.. ......g. a......h a.......
  4 ..c..... ...d.... a...e... ab...f.. .b....g. a......h ab...... .b......
 16 a...e... ab...f.. .bc...g. a.cd...h ab.d.... .bc..... ..cd.... ...d....
 32 ab...f.. .bc...g. a.cd...h .b.de... abc.e... ..cd.... a..de... a...e...
 64 .bc...g. a.cd...h .b.de... ..c.ef.. abcd.f.. a..de... .b..ef.. ab...f..
128 a.cd...h .b.de... ..c.ef.. a..d.fg. abcde.g. .b..ef.. a.c..fg. .bc...g.

xor'ing all letters in each column yields one resulting bit. Here's the result:

a^b^c^d^e^f^g^h, b^c^d^e^f^g^h, c^d^e^f^g^h, d^e^f^g^h, a^b^c^d, f^g^h, g^h, a^b^c^d^e^f^g.

Returning to the wikipedia algorithm; it seems like a multiplication by 255 - except instead of reducing with the polynomial for each step it's followed by a conditional xor with 9.

The first three lines evaluate to:

a^b^c^d^e^f^g^h, b^c^d^e^f^g^h, c^d^e^f^g^h, d^e^f^g^h, e^f^g^h, f^g^h, g^h, h

The last line effectively xors bit 7 (a^b^c^d^e^f^g^h) on bits 3 (e^f^g^h) and 0 (a). In bit 3 e, f, g, and h cancel out. In bit 0 h cancels out. This is the end result:

a^b^c^d^e^f^g^h, b^c^d^e^f^g^h, c^d^e^f^g^h, d^e^f^g^h, a^b^c^d, f^g^h, g^h, a^b^c^d^e^f^g.

Fortunately it yields the same result, but how do you arrive at this algorithm?

Score:4
cn flag

I was the original author of that code, and I am not very proud of that part of it, but here's one way to see why it works.

The multiplication by "3" (my original comment said $x+1$), slightly rewritten, is

p ^= p << 1;
p ^= original_p & 0x80 ? 0x1B : 0;

It is not obvious how to invert this, because when inverting you don't have access to original_p.

But it's possible to reverse the order of operations:

p ^= p & 0x80 ? some_constant : 0;
p ^= p << 1;

0x1B = 00011011 in binary. 00001001 becomes 00011011 after p ^= (p << 1). Therefore, some_constant should be 0x09.

This is easier to invert. The first line is its own inverse, while the inverse of p ^= p << 1 is p ^= p << 1; p ^= p << 2; p ^= p << 4.

There may be some way to understand this as an optimized multiplication by "0xf6", but if there is, I'm not seeing it.

The code would have been nicer if we were inverting multiplication by $x$ ("2"), since in that case the high bit of the input ends up as the low bit of the output, where it's easy to test for when inverting. Unfortunately, $x$ doesn't have full order in this representation of the field.

user7048748 avatar
dk flag
I'm pretty stoked that you are the one to answer this. Thanks a lot! I don't see why you wouldn't be proud though; it must be the optimal way. "x doesn't have full order" means that 2 can't be used as a generator, right? I'm still processing your answer.
user7048748 avatar
dk flag
Your explanation is spot on. Reversing the order of operations and realizing that the first three lines are the inverse of the first part of the multiplication was exactly what I needed to understand. I believe I understand everything you state in your answer except the part about not being proud. You made my weekend. Have a nice one!
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.