Score:1

Do people manually select every integer used in a crypto hash function or are they generated by a computer?

tn flag

For example, this:

enter image description here

Are all those numbers like 7, 4, 13, 1, 10, 6, 15, 3, 12, 0, 9, 5, 2, 14, 11, 8... picked by hand somehow, or are they just selected out of a hat (like you write a script "give me 64 random integers between 0 and 15" and then whatever it outputs you use), or something else? What is the technique for selecting so many numbers?

kelalaka avatar
in flag
[Why do "nothing up my sleeve numbers" have low entropy?](https://crypto.stackexchange.com/q/16364/18298)
kelalaka avatar
in flag
Since you are interested in IV's of hash functions, I would give you more 1) [Origin of the SHA-224 initial hash value?](https://crypto.stackexchange.com/q/42373/18298) 2) [Why are these specific values used to initialise the hash buffer in SHA-512?](https://crypto.stackexchange.com/q/5338/18298) [Why does SHA2-224 use different IV's than SHA2-256?](https://crypto.stackexchange.com/q/83306/18298) 3) [Where do the constants for SHA-1/2/3 come from?](https://crypto.stackexchange.com/q/71314/18298) 4) [Why initialize SHA1 with specific buffer?](https://crypto.stackexchange.com/q/10829/18298)
Score:3
ng flag

Generally no. In cryptography, the appearance of "random" strings of numbers is generally quite suspect --- what if there are certain "weak" choices of numbers that yield the entire construction insecure? How do you know the designer didn't use their freedom in choosing these numbers to embed a backdoor? Note that this does happen, famous examples are:

Note that there are certain "arbitrary constants" that are seen as less suspect --- common choices are the digits of $\pi$, or things of this sort (say the digits of $\sqrt{2}$ or $\sqrt{3}$). These are often called nothing up my sleeve numbers, and can be reached for when the designer just needs some arbitrary numbers that satisfy some minimal constraints that "most" numbers should satisfy.

Anyway, returning to RIPEMD. It was initially specified in the 1995 paper "Integrity primitives for secure information systems." (I found this info via this link). In particular, the specification is contained in this document. I draw your attention to page 75 of the document (on page 7 of the pdf).

The four constants used in these operations are not randomly chosen; they are the integer part of $2^{30}\sqrt{2}, 2^{30}\sqrt{3}, 2^{30}\sqrt[3]{2}, 2^{30}\sqrt[3]{3}$ respectively. However, there is no specific reason for this choice.

This explains the constants in the added constants part of the file you linked --- they are nothing up my sleeve numbers. Unfortunately, the particular table you ask about is not satisfactorily explained. The authors state that they reproduce a modification of the compression function of MD4 (including the table you noticed), but simply state that they include modifications of B. den Boer, which they cite as Personal Communications (so their particular reason for the modification is unknown).

Unfortunately, other material on the cipher is not enlightening either. Consider this attack on the hash. The authors mention (as is readily apparent) that the "selection of message word" indices must specifically be a permutation (this was mentioned in the initial specification as well), but it is not clear why those particular permutations were chosen.

The last thing that I will mention is that sometimes you are in a situation where there are many possible choices of some structured object (like a permutation, so you cannot hope the digits of $\pi$ will suffice), and you need to pick one. One strategy you can do (rather than pulling it out of thin air) is:

  1. try to "rank" the structured objects according to some efficiency metric, and then
  2. choose the "first" (in some natural ordering) of the "most efficient" objects.

This is vague, so a particular example may be useful. The design of AES requires arithmetic over $\mathsf{GF}(2^8)$. Any irreducible degree 8 polynomial over $\mathbb{F}_2$ defines a "different" (but all mutually isomorphic) copy of $\mathsf{GF}(2^8)$. Given that this is the case, which polynomial do you choose in the design of AES?

The AES designers justify this choice (and others which you may find interesting) in section 7 of their proposal.

The polynomial $m(x)$ ... for the multiplication in $\mathsf{GF}(2^8)$ is the first one of the list of irreducible polynomials of degree 8, given in [LiNi86, p. 378].

So they appeal to some ordering by authors other than them. I have also seen (and thought I had for AES in particular) the choice to:

  1. collect all irreducible polynomials
  2. sort them by hamming weight (the efficiency metric)
  3. of those of minimal hamming weight, choose the lexicographically first one.

Of course, this is unfortunately not useful for your particular table.

Lance avatar
tn flag
Very interesting stuff, thank you so much! So in the end, what is the case with the numbers I listed, how were those (potentially) chosen? Never encountered the "nothing up my sleeve numbers", makes sense!
Mark avatar
ng flag
That's the somewhat annoying part for your particular numbers --- I can't track down the source for how they were generated. RIPEMD is a modification of MD4, specified [here](https://datatracker.ietf.org/doc/html/rfc1320). The corresponding numbers for MD4 are different, and have some internal pattern that makes more sense. The RIPEMD authors state that they change these numbers, but only cite a personal communication (from 1992) with B. den Boer. Unfortunately, those personal communications likely have the answer to your question, but it seems unlikely to be able to find them published.
Mark avatar
ng flag
The only small comfort I have is that the published attack on RIPEMD I looked at completely ignored the (somewhat arbitrary) structure of your numbers, so it is plausible they don't meaningfully impact security at all, and they were modified for efficiency purposes on some platform (the RIPEMD authors spend a paragraph or two mentioning they're switching between little/big endian representations for this purpose). This is just a guess though, and without reaching out to B. den Boer (or the RIPEMD authors) it may be hard to get a better answer.
poncho avatar
my flag
Kuznyechik: my understanding that, while the sbox is peculiar, and does not conform to the initial claims of randomness, there is still no (publicly) known backdoor based on it...
Mark avatar
ng flag
@poncho yes, but it has a *very particular* [invariant that it maintains](https://eprint.iacr.org/2019/092.pdf). Perhaps I have an alarmist reading of this, but I can't see a non-backdoor reason for this to be the case.
Mark avatar
ng flag
although it appears I had mentally grouped the issues with Streebog and Kuznyechik as similar, but revisiting that paper it seems that Streebog may use the odd S-boxes in a slightly more concerning way.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.