Score:1

In hardware, why using an algebraic implementation of a (small) S-Box instead of a LUT?

ve flag
Gux

There is this cool paper https://eprint.iacr.org/2017/395.pdf exposing a side channel protected implementation of keccak, in particular Domain Oriented Masking for the Chi function. The Chi function is a 5bit S-Box, so can also be seen as a lookup table of 32 5bit values. My question is, in an hardware implementation with no cache allowing a cache attack (I guess), why bother implementing the masked algebraic version of Chi instead of a straightforward LUT? Is it for security? Gate count? Performances? As you may have guessed, I'm not a hardware designer...

poncho avatar
my flag
"The Chi function is a 5bit S-Box"; huh? The Chi step uses the operation (in C): `x(new) = x ^ (~y & z)`; that's either 2 or 3 bits (depending on whether you include the xor in the sbox). In any case, either in software or in hardware, masked or unmasked, a LUT wouldn't make sense - it'd be a lot easier in both cases to use the straightforward logic.
Gux avatar
ve flag
Gux
I meant a row is 5 bits, that will be substituted by another 5 bits, which leaves 2^5 possibilities.
poncho avatar
my flag
Ok, in that case: for hardware, doing it directly (for 5 entries) would take 10-15 gates, while a 5 bit S-box would take circa 100 gates. You do the math...
Score:1
ng flag

Using a masked version of some function is for security; specifically here, for secrecy of what's hashed, which is secret when Keccak is used to build a MAC, or when it's hashed some secret towards signing it.

Gate count and performance are rather harmed by masking, thus are not the goal of masking. Rather, masking strives to not too much harm gate count and performance.

Indeed, natural implementation of a 5-bit S-box would use no cache, and thus be immune from cache-related attacks.

Gux avatar
ve flag
Gux
Thanks for our comment. It occurs also to me that masking harms performances and gate count. This is why I thought that a hardwired substitution table could make sense. But thinking more about it, I realized that this substitution has to occur for the 5 rows of a slice, and for the 64 slices, preferably at the same time. Maybe this is why an algebraic implementation is more efficient in hardware? But I still wonder, for a single substitution, if an hardware 5 bit lookup table implementation would be susceptible of leakage.
b degnan avatar
ca flag
@gux masking doesn't necessarily harm performance because often you pass S and /S. Also, considering how terrible un-hinted routers are, every time you recompile, you'll get a different power number. I found that paper to be uninspired as there was nothing new there.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.