Score:1

Hash Flooding a Randomized Modular Hash Table

ar flag

Assume we have a hash table using the function h(x) = x mod 32. h(x) = x mod 33. Also assume it dynamically resizes by doubling the amount of buckets and rehashing. If I was able to provide inputs for the hash table it would be really easy to flood with colliding entries to slow lookups to O(n).

However, say we were to xor each input with a random pepper before hashing. If the input is ever longer than the pepper we just generate more random pepper dynamically.

If this hash function is used on some server somewhere and people have the ability to supply it with entries, is it (relatively) safe from algorithmic complexity attacks? Is it vulnerable to any special attacks that don't work on SipHash?

More broadly speaking, how are keyed hash functions better than hash randomization for deterring hash flooding complexity attacks?

Edit: As pointed out by bmm6o, it turns out that h(x) = x mod 32 has no scrambling properties whatsoever when used in conjunction with xor, which partially answers my question. Other modulos that are not a power of two still work but the extent of the scrambling is unclear.

ph flag
For how much of this question are we to assume `h(x) = x mod 32` is the hash function? Is that still the function in the pepper paragraph and the one after?
ph flag
Also note that `h(x) = x mod 32` and xoring in pepper seem specifically chosen to not combine effectively. Do you see that two inputs that collide without the pepper still collide with the pepper?
DivideByZero avatar
ar flag
@bmm6o Good point. I didn't see that earlier when I tested with a different modulo. It seems like (A ^ B) % P = A % P ^ B % P applies whenever P is a power of 2 which ruins the collision resistance. In retrospect that seems obvious, but it might be the key to hash flooding this type of table. I'll try visualizing some hash/mod relationships and see if I can figure out the pattern. I will also make an edit to the problem so that this doesn't interrupt things. I am tempted to make two separate questions but I am not sure if they would be redundant.
Score:2
my flag

If this hash function is used on some server somewhere and people have the ability to supply it with entries, is it (relatively) safe from algorithmic complexity attacks?

No, here is one attack funnels all entries into 33 slots (even if there were 10,000 entries):

Have the attacker submit 10,000 entries that all end with 16 zeros (actually, it doesn't matter what the end is, it just needs to be consistent with the entries).

When you apply the pepper, you end up with 10,000 entries with the same lower 16 bits (what those bits are depend on the pepper).

Then, you compute each entry $x$ modulo $33 \cdot 2^k$ (where $k$ is the number of resizings you've done). If $k \le 16$, this result will be one of 33 possible values, and so your hash function will try to place everything into those 33 places.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.