Score:3

Crypto

What are the properties of a hash function?

TIMOTHY LIM

11/13/22, 12:18 AM

I have created a hash function. If I am asked whether it conforms to the definition of a hash function, I only know that it should have a fixed-size output. I use multiplication and adding of every plaintext character with the random number I assigned.

Aside from the fixed-size output, what other characteristics must a hash function have, and why?

1008

1 + 3

hash

poncho

11/13/22, 2:26 AM

Are you asking about a cryptographic hash function, or just a hash function (say, for use within a hash table)?

TIMOTHY LIM

11/13/22, 2:30 AM

I am asking for a hash function.

Maarten Bodewes

11/14/22, 1:32 PM

That's nice and all, but this is [crypto.se]. If you want a broader definition you may want to go to e.g. [cs.se].

Score:6

Crypto

fgrieu

11/13/22, 7:10 AM

From a cryptographic standpoint, a hash function with fixed-size output:

Must be a deterministic function: the same input must always generate the same output.
Must accept input messages in a wide input set. Ideally that's should be arbitrary bitstring, but often is arbitrary bytestrings or character strings, perhaps up to some size.
Baring special qualifier (like keyed or secret or random), must be public: every step and constant needed for the calculation of the output for any given input is known to all.
Must compute it's result in time polynomial with the input size. That should be time roughly linear with the input size.
Should have collision-resistance: it is computationally impossible to exhibit two distinct messages with the same output (this property implies second preimage resistance, which is that given a random input, it is computationally impossible to find another input yielding the same output). Notice that if there are $n$ possible outputs, there are generic attacks that break collision-resistance with about $\sqrt n$ evaluations of the function, thus collision-resistance implies $n$ is large (say, at least $2^{192}$ nowadays); if not, the definition of collision-resistance must be loosened to: the best method to exhibit two distinct messages with the same output is a generic attack.
Should have (first) preimage-resistance: given the output for a random unknown input, it is computationally impossible to find that input (or another input with the same output) easier than by trying inputs. Notice that if there are $n$ possible outputs, there are generic attacks that break preimage-resistance with about $n$ evaluations of the function, thus preimage-resistance implies $n$ is large (say, at least $2^{96}$ nowadays); if not, the definition of preimage-resistance must be loosened.

More generally, a modern cryptographic hash should broadly behave as a random oracle, that is a box implementing a function, which output is random for every particular input. For large-enough output set, that implies collision-resistance, preimage resistance, and more:

For unknown random input with any natural characteristic (that is, independent of the definition of the hash function; for example, input that consists of 40 decimal digits which sum is divisible by 10), the output should be computationally indistinguishable from uniformly random in the output set.
Resistance to length-extension attack: given output for unknown input $M$, it should be impossible to compute output for input an extension of $M$ (that is $M\mathbin\|E$ for non-empty $E$) easier than by finding $M$ by trying inputs. Notice that still-common hashes such as SHA-256 do not have that later property.

0 + 4

user3742898

11/13/22, 2:28 PM

I would distinguish between "a hash function has" (fixed size output, is a function, accepts all relevant inputs), and "a _worthwhile_ hash function has" (linear time, collision resistant, preimage resistant). "return 0;" is a hash function. It's a terrible hash function, but it is a hash function. Bad hash functions are frequently useful when looking for bugs in algorithms that use hash functions, or to help clarify which properties (speed, collision resistance,...) are most important for a given application.

Blindy

11/13/22, 2:38 PM

Since OP has clarified he's not talking about cryptographically-secure hash functions, I believe three points should be loosened in your definition: 1. there's no need for the function to be public for it to be a good hash function (peer review is helpful, but not necessary), 2. Given a large input and the need for fast hashing, "computationally impossible" duplicates is merely a suggestion, and 3. impossible reversal of the function is fundamentally not needed, `f(x) = x` is a very fast and efficient way to hash an integer.

fgrieu

11/13/22, 3:38 PM

@Blindly: since the OP has been simultaneously asking a [crypto-beginner question](https://crypto.stackexchange.com/q/92044/555), I decided to take the question and the "I am asking for a hash function" comment as an indication the OP does not know what a cryptographic hash function is, rather than an indication that the question is not about crypto.

poncho

11/13/22, 3:45 PM

Actually, a cryptographically bad hash function can better for use within a hash table than a good one - that is, it can cause fewer expected hash compressions than a random-looking function. That can happen depending on the distribution of the inputs; a well chosen (noncryptographical) hash function can spread its outputs better than expected.