Reading the selected answer to Designing a hash function from first principles rather than depending on heuristics is very insightful.
The section on "nonlinearity" suggests that making every equation involved in the hash function into a linear one means the attacker can easily figure out the implementation of the hash function.
If you try to make a cryptographic hash function that only uses XOR and shifting, the equations will be linear even after infinitely many rounds, now the analyst is just a single Gauss elimination away from solving it and now they have the ability to create arbitrary preimages.
In order to avoid that, you need to make your equations non-linear by using the AND operator. Hashes do all sorts of shifting and XOR-ing but this non-linear step is what makes them secure.
But having non-linear terms are not enough. You must make sure the attacker cannot effectively cancel out the non-linear terms out of your equations by fixing some inputs to 1 or 0. Also you need to make sure the terms will not cancel if the attackers takes the difference of two equations. If you have the formula ++ for one and another formula ++ for another bit. Adding these two will give + which is now linear and allows the attacker to solve it to get the relationship between two out bits. Every independent linear equation an attacker can create will reduce the security of your hash with 1 bit.
Can you explain in more detail how this would work? Given some hash function which has been broken, how could you show this linearity situation on that hash function? Basically, say md5 is an example (insecure for some reason). Is it insecure for this linearity reason? If so, what does that mean in terms of the md5 hash function? What are the equations that were found to be linear? If md5 is a bad example, what is a good example? Basically, how do you look for linearity in a hash function bit by bit, what are the techniques to use?
I would like to know the techniques an attacker might use to "solve" the hash function based on input/output alone (without seeing the implementation). But since that is probably too general or too involved a question, this question focuses only on this linear equation aspect. What are the "linear equation" techniques an attacker might use to solve a hash function?