Score:5

Generating X ids on Y offline machines in a short time period without collision

bh flag

In theory this is what I have :

  • Around 10 000 offline machines
  • Each machine will generate around 10 000 ids
  • I can program the machine in any way I want, but I prefer a low memory and low cpu
  • They all will be generated in a short timeframe (1 day)
  • It should not have id collision
  • It should not be possible to determine when and which machine has generated it.

How could I do that ?

knaccc avatar
es flag
The easiest way to do this is to find a library that will generate a type 4 UUID. This is essentially a random 128-bit number, which is so huge that you won't get any collisions. Alternatively, just use a secure random number generator to generate a 128-bit number, and encode it as an integer, or as hex, or in any format that suits your needs.
Score:7
my flag

If you want to have smaller identifiers than the 128 bit ones suggested in the comment, here is one approach:

  • Give each offline machine the same secret key (see size discussion below)

  • Give each offline machine a unique 16 bit identifier

  • Have each offline machine use the Speck block cipher algorithm with a 32 bit block size, with the block input being the machine's identifier concatenated with a counter (0000 for the first identifier generated by this device, 0001 for the second identifier, etc); the output of the Speck algorithm is the identifier.

You can then take those 32 bits, and turn them into a format usable as an identifier.

This supports all your requirements:

  • It supports 64k offline machines; well above your "about 10,000" request

  • Each machine can generate 64k id's; well above your "about 10,000" request

  • Low memory and CPU - this uses a small fixed storage on each machine and not much CPU

  • Generate within a day - actually, unless your offline machines are seriously wimpy, generating 10,000 ids should take less than a second

  • No id collisions (because Speck is a permutation, and each 32 bit input to Speck is unique across the entire id generation process)

  • It is not possible to determine where/when an id was generated, assuming that the secret key cannot be recovered.

This last bit about the secret key - Speck with a 32 bit block size uses a 64 bit key. Now, depending on your adversary, a 64 bit key may not be big enough. If it is not, then generate a 3 64 bit secret keys, and for each id generation, run Speck 3 times (using each key in sequence); that is, we have $id = Speck_{k3}(Speck_{k2}(Speck_{k1}( machineid || seqnum )))$

That will be secure against any plausible adversary.

Now, a more general solution would be to use a Format Preserving Encryption Method; that can hande an arbitrary sized block of data, not just the 32 bits we have here. I didn't suggest it, because FPE algorithms are not quite as easy to describe (and the sizes just happened to be about right for Speck). On the other hand, if you asked for a solution that worked with 100,000 offline machines each generating 100,000 ids, well, 32 bit Speck wouldn't work - an FPE solution could.

y.petremann avatar
bh flag
The idea is interesting, there is some points I should have mentionned. For the "It should not be possible to determine when and which machine has generated it." My problem is that:
y.petremann avatar
bh flag
1) One machine can be stolen, if the machine is retro-engineered, the attacker shouldn't access a shared key that could at least make it possible to determine which machine generated which id.
y.petremann avatar
bh flag
2) I consider that the machines would be used by a government, which put them in place, I consider this government can be the attacker.
Score:2
ph flag
jpa

Poncho's solution is great if the shared secret key is ok.

Here is a solution that adds the following extra security properties:

  • No shared key: None of the offline machines can determine the creator or order of IDs created by other machines.
  • Forward secrecy: Discovery of machine-specific key will not leak order or creator of IDs created by that machine in the past.

It does require a trusted machine to calculate and distribute initial seeds for each machine.

The algorithm for the machines is simple:

  1. Keep 256 bit long internal seed value.
  2. To generate ID, take the bottom 64 bits of the state. Apply SHA-256 to the whole state to get the next seed.

Normally this would have a small chance (about 1 in 64 for 100 milion IDs) of random collisions. To guarantee lack of collisions, the seed generating machine should calculate all possible ID values for each machine ahead of time.

If there is a collision, just choose a different random seed for that machine. Computing 100 million SHA-256 hashes takes less than a minute. Even a less wide and lower CPU cost hash algorithm would be sufficient for the purpose without significantly affecting security.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.