How to draw words randomly from a physical dictionary, using dices?
Assuming we know the total number of pages $p$ in the dictionary, and can estimate some $w$ so that no page has more than $w$ words on it, we can use rejection sampling for exact equidistribution:
- find the smallest $k$ with $6^k\ge p$, and the largest $d\in\{1,2,3\}$ with $6^k\ge d\,p$
- find the smallest $\ell$ with $6^\ell\ge w$, and the largest $e\in\{1,2,3\}$ with $6^\ell\ge e\,w$
- for each of the 4 words to choose
- repeat
- $i:=0$
- repeat $k$ times
- draw a dice value $v$ in $[1,6]$
- $i:=6i+v-1$
- $i:=\lfloor i/d\rfloor+1$, which is uniformly random in $[1,6^k/d]$
- if page $i$ exists in the dictionary and contains at least one word
- $j:=0$
- repeat $\ell$ times
- draw a dice value $v$ in $[1,6]$
- $j:=6j+v-1$
- $j:=\lfloor j/e\rfloor+1$, which is uniformly random in $[1,6^\ell/e]$
- if there are at least $j$ words on page $i$
- pick the $j^\text{th}$ word of page $i$ and exit the repeat loop
We can get away with $w$ perhaps a little too small, e.g. $w$ at least $2W/p$, where $W$ is the approximate number of words in the dictionary, as long as words past index $w$ in their page (which can't be chosen) are only a small fraction of the words.
on a dictionary with 20.000 words is it OK for me to get only 4 truly random words to use as a additional passphrase to my bitcoin seed?
This gives $4\log_2(20000)\approx57$ bit of entropy. That's sufficient, or not, to deter brute force search, depending of the key stretching used to change the 4 words into a key.
It's been cited BIP39, which uses PBKDF2 with $2^{11}$ iterations and HMAC-SHA-512. The cost of searching all the keys would be dominated by $2^{57+11+1}=2^{69}$ SHA-512 hashes, which is uncomfortably few (I don't want to go as far as estimating how that would be best done with AWS, or worse extrapolating that in 5 years). I suggest using Argon2 instead of PBKDF2 HMAC-SHA-512, and bumping the cost parameters to 10 seconds of calculation, and then that's plenty safe enough.