given an algorithm of a hash function, how to transform it into a circuit?
Most importantly we first want to better state the problem
- What hash function?
- Do we want a circuit for the full hash function and a fixed-length input [which seem best matches the question], or for some part of the hash [like a round of the hash function with input one padded message block in the linked material, if I read it correctly]?
- Why do we want "a circuit"? This will impact the nature of what we produce.
- A zero-knowledge proof involving the hash as in the linked example? That will direct towards an expression as a long chain of gates restricted to some varieties (here XOR, AND, and INV)
- Testing how a pure SAT solver handles solving some problem related to the hash (like preimage)? The expression will typically be even more restricted (no XOR), on the other hand negation is usually free.
- Optimized realization in silicon or FPGA? Typically an overly deep purely Boolean expression will be useless, we'll need intermediary latches, and unless the whole thing is deeply pipelined or the hash terribly irregular we'll have some logic reused across rounds. I won't cover that.
- In what form do we want the output? For a purely Boolean circuit, most formats will number variables. I guess the example has 512 inputs numbered from 0 to 511, 116246 gates (one per line) each producing one new variable for a total of 116758 variables, and 256 outputs (it could be variables 116502 to 116757, I'm not sure), with this described in the first two lines per some simple convention. Below are one gate per line, with I guess for each
- the number of inputs [1 for NOT and 2 for others in the example]
- the number of outputs [1 in the example]
- the list of input(s)
- the list of [one] output(s)
- the name of the gate [XOR, AND, INV in the example]
The rest broadly follows the caveman's technique to eat a mammoth (one chunk at a time) and the progress from there (tools).
We take the algorithm step by step, unrolling each loop, and express each step as Boolean equations. For example if all variables are 32-bit (like in SHA-256):
- a statement like
C = A ^ B
can be translated into 32 new variables for C, output by 32 XOR gates, requiring 32 lines. We need to keep track of the numbers designating the new variables assigned to C
.
E = C + D
requires intermediate variables, thus more lines. We need 30 full adders, and then two simplified ones (no carry out for the high-oder bit which thus reduces to one XOR; no carry in for the first which thus reduces to one XOR and one AND).
F = (E<<3)|(E>>29)
would require no line, only a reassignment of variables for E
.
There are a few tricks that can sometime get simpler expressions, but for hashes of cryptographic interest the expression will remain long. If it was not, the hash would be weak.
It's reasonably easy to make a program that does this from scratch, and in my experience that's easier than finding and mastering something adequate. Existing tools can be useful to automatically simplify expressions, but for most cryptographic hashes a little analysis of the hash's equations will get most simplifications it's possible to get, and might give better results than automated tools.