rand % 20
generates a result in $\{0,1,\ldots,18,19\}$ that is nearly uniform (assuming rand
is): $\Pr(19)/\Pr(0)=1-1/922337203685477581$. That's often a tolerable bias.
On a "laptop/desktop PC" with a modern 64-bit CPU, rand % 20
is reasonably fast, and has the important virtues of being correct, simple, and easily adaptable. However it's at least often (see comment) possible to be faster using
(rand-((rand-(rand>>2))>>1))>>59
which has the same (optimum) ratio between the least and most probable outcomes, while using only shift and add operations. I'm more confident that the generated code is constant-time, which can be important in crypto applications. And the mean is closer to $19/2$.
For an intuition of how that formula works: for any $x\in\mathbb R$ it holds $(x-(x-x\,2^{-2})\,2^{-1})\,2^{-59}=20\,x\,2^{-64}$, thus we essentialy evaluate what the expressions (uint64_t)floor(rand*(20/(UINT64_MAX+1.)))
or (uint64_t)((rand*(uint128_t)20)>>64)
attempt to evaluate. Notice that for some values including rand=0xCCCCCCCCCCCCCCCC
the later formula does not exactly coincide with the formula I propose; yet the distribution achieved by both is optimally uniform.
The method is not limited to the constant $m=20$ for the array size. It generalizes to any constant $m$ with moderate Hamming weight. Computing appropriate shift counts from the constants is nontrivial. I refer to this marvelous answer (note: the last shift count given there must be increased by 32 in the case at hand) for something that works, but is not quite always optimal. I have no other reference for the method, which I (re-?)invented for an ARM Cortex-M0, where it proved useful. Actually I only empirically found formulas for a few constants fitting my need, and Anders Kaseorg takes full credit for how to generate formulas systematically.
If we are willing to loose a little uniformity and assurance that the code is constant-time, we can use
((rand>>3)*5)>>59
which is simpler, likely faster, and easier to adapt to other constants $m$ rather than $20$: we write $m$ as $r\,2^i$ with $i$ an integer and $r$ preferably odd, then find the integer $j$ with $2^{j-1}\le r<2^j$. We use ((rand>>j)*r)>>(64+i-j)
. Problem is, the lower $j$ bits of rand
are not used, and the uniformity of the outcome is correspondingly reduced (except if $m$ is a power of two).
When $m$ is $2^j$ for some integer $j$, we can use rand>>(64-j)
or rand&(m-1)
. The later is noticed in that other answer. These methods looses no uniformity, if all bits of rand
are uniform and independent.
If $m$ changes at runtime with $m<2^j$ for some known constant $j$, we can use
((rand>>j)*m)>>(64-j)
however the $j$ lower bits of rand
are lost and that reduces the uniformity of the outcome (except if $m$ is a power of two).
Off-topic:
(uint64_t)(floor(rand*(20/(UINT64_MAX+1.))))
would be OK if there was no rounding error, but because these exist it's hard to tell if it can yield 20
for some input; also on many compilers it's not optimally uniform.
(uint64_t)((rand*(uint128_t)20)>>64)
is mathematically correct, and very close to what we evaluate, but uint128_t
is an optional and still marginally supported C feature.
- The question's
rand/UINT64_MAX * 20
outputs in $\{0,20\}$ thus is unfit. Problems are the division rounds down to integer, and (independently) that rand
can be UINT64_MAX
.
- The question's
20/(UINT64_MAX/rand)
outputs in $\{0,1,2,3,4,5,6,10,20\}$ and can cause a division by zero, thus is unfit. Problems are the division rounds down to integer, and (independently) that rand
can be 0
.
- The question's code fragment 3 always has
i%5 != 4
on output, thus is unfit. Problem is that the output i
is constructed as 10+5+2+1
with some term(s) removed.