Score:2

Which hashing function is good enough for session IDs?

br flag

Background

I'm building a URL shortener, and the URL to shorten may contain a SessionId.

For the URL shortener to not compromise the security of the SessionId, I must use a shortening strategy that fulfills the same requirements as the SessionId.

OWASP states that:

  • The session ID length must be at least 128 bits (16 bytes) here
  • The session ID value must provide at least 64 bits of entropy here

What I think about doing

  • Have a key-value store, where the value is the long (unshortened) URL, and the key is the shortened URL component.
  • When the user navigates to a shortened URL, the long URL is looked up in the key-value store, and returned to the user, who is then redirected.

So the key needs to be at least 128 bits (16 bytes) long, and provide at least 64 bits of entropy, to not compromise the SessionId.

If the key is a hash of the value, I can guarantee the key length, and know the entropy (based on the hashing algorithm used).

But which hashing algorithm should I use?

For example, MD5 digest length is exactly 128 bits. But I don't know what's it's minimum entropy.

The question

Which hashing algorithm produces a digest of at least 128 bits, with at least 64 bits of entropy?

(x-post from StackOverflow: https://stackoverflow.com/q/71224441/6517320)

kelalaka avatar
in flag
Could you maintain one copy of your question? Please see [Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?](https://meta.stackexchange.com/q/64068/403350) for details.
kelalaka avatar
in flag
Note that entropy is not the property of the value/data it is the property of the source. Get uniform random of size 64-bit and hash it and truncate to 128-bit. Use BLAKE2 as the fastest cryptographic hashing though you don't need...
Score:5
cn flag

The short answer to your hashing question is "use SHA-256." That's the answer for almost every secure-hashing problem, unless the answer is "use SHA-512." If you want a 128 bit hash, then you can truncate SHA-256 (take the first or last 128 bites). All bits in SHA-256 are independent, so you can extract any 128 of them as a hash.

That said, IMO you're thinking about this problem incorrectly. The point isn't about protecting SessionId specifically. The issue is that URLs may contain sensitive information, of which SessionId is just one example (if it's stored in the URL). If someone already knows the shortened URL, they can just ask your system for the full URL, so this is all about stopping attackers finding keys by guessing. You need to make your keyspace sparse, which is it say "much, much larger than the number of keys actually stored."

You're maintaining a key/value database, so there's no need to use a hash at all. You can just generate a random key for each URL. This is better than a hash because there's absolutely no connection between the key and the value.

Given your design, attackers cannot search offline. They have to contact your server. Say you can serve 1,000 requests/second, and you scale your total keyspace to be a trillion times larger than your planned number of URLs. That would take an attacker roughly 15,000 years (~1/2 the space searched) to find a single URL if they could consume all of your available bandwidth (which I expect you might notice....). With just a little bit of rate limiting per IP address, you could dramatically complicate this attack.

Given the above, if you wanted to store a billion URLs in your system, you'd need a keyspace of:

log2(1 billion URLs * 1 trillion multiplier) = 80 bits

In Base58 (which I like for this kind of problem because it's human-friendly), it would take around 14 characters. Tweaking the above values for rate limiting, period of attack you want to protect against, and number of stored URLs, you can choose how long your keys are.

You generally can compute random values on this scale without worrying about collisions (which is good for performance). For the same reason that it's extremely hard for an attacker to find a collision, it's extremely unlikely that you'll have one by accident. But if you want to double-check just how unlikely, look at the Birthday Attack. The calculations for "what's the likelihood that any values collide" are a different than "how long will it take an attacker to find a collision," and in some cases will force you to use longer keys.

IMO there's no need for hashes. But if you do need one, use SHA-256, truncated to whatever number of bits you want.

Ivan Rubinson avatar
br flag
How does this guarantee at least 64 bits of entropy?
cn flag
That depends on your PRNG. On Linux systems, if you use /dev/random, I believe it will block if the system drops below 160 bits of available entropy, so you should always be ok on that kind of system. I would expect any system you use a cryptographic PRNG is going to greatly exceed 64 bits for every value, but you'll have to check your documentation to be certain. For example, on some (but not all) systems /dev/urandom will return low-entropy values.
cn flag
But no hashing function can increase entropy, and if they are collision-resistant over the space in question, they will not (effectively) reduce entropy. The entropy from the original SessionId is going to be preserved by any cryptographic hash. If it had less than 64 bits originally, then so will the hash, and if it had more than 64 bits, so will the hash (minus a little). To increase entropy, you have to add a random salt, which means you're tied to the entropy of your PRNG (just like using a random key). https://crypto.stackexchange.com/questions/12505/what-happens-to-entropy-after-hashing
Ivan Rubinson avatar
br flag
So entropy of hash <= entropy of plaintext; and if the hashing algorithm is collision-resistant then entropy of hash >= entropy of plaintext. Therefore to ensure sufficient entropy, the plaintext needs to have high entropy. The long URL has low entropy, as opposed to a cryptographically strong pseudorandom string of bytes; which is why you recommend the key to not be a hash of the long URL.
cn flag
Close, but the entropy of a hash is never greater than the entropy of the plaintext. Collision-resistant hashes just reduce the entropy less (generally by an unimportant amount). The rest of your comment is correct.
Ivan Rubinson avatar
br flag
Pardon the confusing notation. If `A <= B and A >= B` then it's impossible for `A > B`. Because hashing can't increase entropy, but can decrease entropy, we don't stand to gain from hashing, but only lose
SAI Peregrinus avatar
si flag
"On Linux systems, if you use /dev/random, I believe it will block if the system drops below 160 bits of available entropy" That hasn't been true for a while now. Not that it matters, but it only blocks in early boot and then never again. Entropy isn't "used up", so blocking after seeding never helped anything.
cn flag
Thanks @SAIPeregrinus. I haven't be a sysadmin for Linux in a long time :D Do you have any links on the current state of affairs in Linux so I can get better educated? (My cryptography brain absolutely understands what you mean; I just want my sysadmin side to understand what happens in practice.)
kelalaka avatar
in flag
[What issues are there while using Linux's /dev/urandom for generating cryptographic keys?](https://crypto.stackexchange.com/q/85533/18298)
SAI Peregrinus avatar
si flag
Also see [this LWN article](https://lwn.net/SubscriberLink/884875/650dde925be055a1/) on a proposed patch to unify the behavior of /dev/random and /dev/urandom and getrandom(flags=0).
Ivan Rubinson avatar
br flag
That's a lot of talk about how Linux handles CSPRNG. What about Windows?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.