Score:3

What are the properties of a hash function?

gr flag

I have created a hash function. If I am asked whether it conforms to the definition of a hash function, I only know that it should have a fixed-size output. I use multiplication and adding of every plaintext character with the random number I assigned.

Aside from the fixed-size output, what other characteristics must a hash function have, and why?

poncho avatar
my flag
Are you asking about a cryptographic hash function, or just a hash function (say, for use within a hash table)?
gr flag
I am asking for a hash function.
Maarten Bodewes avatar
in flag
That's nice and all, but this is [crypto.se]. If you want a broader definition you may want to go to e.g. [cs.se].
Score:6
ng flag

From a cryptographic standpoint, a hash function with fixed-size output:

  • Must be a deterministic function: the same input must always generate the same output.
  • Must accept input messages in a wide input set. Ideally that's should be arbitrary bitstring, but often is arbitrary bytestrings or character strings, perhaps up to some size.
  • Baring special qualifier (like keyed or secret or random), must be public: every step and constant needed for the calculation of the output for any given input is known to all.
  • Must compute it's result in time polynomial with the input size. That should be time roughly linear with the input size.
  • Should have collision-resistance: it is computationally impossible to exhibit two distinct messages with the same output (this property implies second preimage resistance, which is that given a random input, it is computationally impossible to find another input yielding the same output). Notice that if there are $n$ possible outputs, there are generic attacks that break collision-resistance with about $\sqrt n$ evaluations of the function, thus collision-resistance implies $n$ is large (say, at least $2^{192}$ nowadays); if not, the definition of collision-resistance must be loosened to: the best method to exhibit two distinct messages with the same output is a generic attack.
  • Should have (first) preimage-resistance: given the output for a random unknown input, it is computationally impossible to find that input (or another input with the same output) easier than by trying inputs. Notice that if there are $n$ possible outputs, there are generic attacks that break preimage-resistance with about $n$ evaluations of the function, thus preimage-resistance implies $n$ is large (say, at least $2^{96}$ nowadays); if not, the definition of preimage-resistance must be loosened.

More generally, a modern cryptographic hash should broadly behave as a random oracle, that is a box implementing a function, which output is random for every particular input. For large-enough output set, that implies collision-resistance, preimage resistance, and more:

  • For unknown random input with any natural characteristic (that is, independent of the definition of the hash function; for example, input that consists of 40 decimal digits which sum is divisible by 10), the output should be computationally indistinguishable from uniformly random in the output set.
  • Resistance to length-extension attack: given output for unknown input $M$, it should be impossible to compute output for input an extension of $M$ (that is $M\mathbin\|E$ for non-empty $E$) easier than by finding $M$ by trying inputs. Notice that still-common hashes such as SHA-256 do not have that later property.
user3742898 avatar
jp flag
I would distinguish between "a hash function has" (fixed size output, is a function, accepts all relevant inputs), and "a _worthwhile_ hash function has" (linear time, collision resistant, preimage resistant). "return 0;" is a hash function. It's a terrible hash function, but it is a hash function. Bad hash functions are frequently useful when looking for bugs in algorithms that use hash functions, or to help clarify which properties (speed, collision resistance,...) are most important for a given application.
Blindy avatar
in flag
Since OP has clarified he's not talking about cryptographically-secure hash functions, I believe three points should be loosened in your definition: 1. there's no need for the function to be public for it to be a good hash function (peer review is helpful, but not necessary), 2. Given a large input and the need for fast hashing, "computationally impossible" duplicates is merely a suggestion, and 3. impossible reversal of the function is fundamentally not needed, `f(x) = x` is a very fast and efficient way to hash an integer.
fgrieu avatar
ng flag
@Blindly: since the OP has been simultaneously asking a [crypto-beginner question](https://crypto.stackexchange.com/q/92044/555), I decided to take the question and the "I am asking for a hash function" comment as an indication the OP does not know what a cryptographic hash function is, rather than an indication that the question is not about crypto.
poncho avatar
my flag
Actually, a cryptographically bad hash function can better for use within a hash table than a good one - that is, it can cause fewer expected hash compressions than a random-looking function. That can happen depending on the distribution of the inputs; a well chosen (noncryptographical) hash function can spread its outputs better than expected.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.