Score:1

How to draw words randomly from a physical dictionary?

jp flag

Suppose I have a real physical dictionary, and I want to draw words from it randomly, using dices. How should I do it?

Maybe it's easier to work with a coin since it's easier to convert binary to a decimal, but whatever, if I use dice, then I can generate a number in base 6 with some dice throws.

The problem is that the dictionary does not have a index number for each word, so I think that picking a random page number then a random word on that page is not uniformly distributed. If some page has more words than the others, then I guess it's not uniform.

By the way, on a dictionary with 20.000 words is it ok for me to get only 4 truly random words to use as a additional passphrase to my bitcoin seed? I just want it to be really hard to crack for the next 5 years on a AWS machine such that it will not cost more than $100.000 for the entire attack.

I don't want to use the bip selected words because they are not in my main language

kelalaka avatar
in flag
I don't see any relation to Cryptography. This is a pure algorithm design. Better to be asked in CS?
kelalaka avatar
in flag
The easy way to generate random use `/dev/urandom/` and select page, then select another random for the word in the page ( determine the ranges). For the word random, if there is not enough word on the page discard the random and select another.... Note that the word is not really important, what is really important is the size of the set you use as in dice-wire and Bip39. The better is choose the word deliberately so that there is no absurd word that you cannot connect to the other words...
Rafaelo avatar
jp flag
@fgrieu it would be used as a passphrase in bitcoin, following the bip39 and others, that instruct how to convert the seed + passphrase into a key.
Score:1
ng flag

How to draw words randomly from a physical dictionary, using dices?

Assuming we know the total number of pages $p$ in the dictionary, and can estimate some $w$ so that no page has more than $w$ words on it, we can use rejection sampling for exact equidistribution:

  • find the smallest $k$ with $6^k\ge p$, and the largest $d\in\{1,2,3\}$ with $6^k\ge d\,p$
  • find the smallest $\ell$ with $6^\ell\ge w$, and the largest $e\in\{1,2,3\}$ with $6^\ell\ge e\,w$
  • for each of the 4 words to choose
    • repeat
      • $i:=0$
      • repeat $k$ times
        • draw a dice value $v$ in $[1,6]$
        • $i:=6i+v-1$
      • $i:=\lfloor i/d\rfloor+1$, which is uniformly random in $[1,6^k/d]$
      • if page $i$ exists in the dictionary and contains at least one word
        • $j:=0$
        • repeat $\ell$ times
          • draw a dice value $v$ in $[1,6]$
          • $j:=6j+v-1$
        • $j:=\lfloor j/e\rfloor+1$, which is uniformly random in $[1,6^\ell/e]$
        • if there are at least $j$ words on page $i$
          • pick the $j^\text{th}$ word of page $i$ and exit the repeat loop

We can get away with $w$ perhaps a little too small, e.g. $w$ at least $2W/p$, where $W$ is the approximate number of words in the dictionary, as long as words past index $w$ in their page (which can't be chosen) are only a small fraction of the words.


on a dictionary with 20.000 words is it OK for me to get only 4 truly random words to use as a additional passphrase to my bitcoin seed?

This gives $4\log_2(20000)\approx57$ bit of entropy. That's sufficient, or not, to deter brute force search, depending of the key stretching used to change the 4 words into a key.

It's been cited BIP39, which uses PBKDF2 with $2^{11}$ iterations and HMAC-SHA-512. The cost of searching all the keys would be dominated by $2^{57+11+1}=2^{69}$ SHA-512 hashes, which is uncomfortably few (I don't want to go as far as estimating how that would be best done with AWS, or worse extrapolating that in 5 years). I suggest using Argon2 instead of PBKDF2 HMAC-SHA-512, and bumping the cost parameters to 10 seconds of calculation, and then that's plenty safe enough.

Rafaelo avatar
jp flag
What about 5 words, thus $2^{71+11+1}$ hashes? And I'm not confortable with changing the key stretching algorithm, I want to use a hardware wallet without any modifications. I don't want to go higher than 5 words which would already be too hard to remember
fgrieu avatar
ng flag
@Rafaelo: I don't want to go too applied because: 1) It's about cryptocurrency, which I regard as a dangerous thing for both participants to the cult, and others. 2) The question assumes an attack model (AWS) that ignores the possibility of attack with specialized hardware, when cryptocurrency uses that routinely. I can't endorse that model.
Paul Uszak avatar
cn flag
Any chance that this isn't off topic because a moderator is answering it? Doesn't really matter any more given this minute's news.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.