Score:5

Deterministic shuffling algorithm

cn flag

Is there a well know (cryptographically secure) algorithm to shuffle a vector of values, deterministically, without using any randomness, such that it is hard (or as hard as possible) to guess its initial configuration?

The goal is to achieve an effect similar to what a hash function achieves, but instead of producing a digest, it should simply shuffle the data.

I know I can hash the data and use the hash value as seed for a pseudo-random number generator and then apply a standard shuffling algorithm using that. But I wonder if there are more established ways of doing it instead of just adopting an ad-hoc solution.

ar flag
Do you want it to be possible to un-shuffle the vector afterwards? And, in particular, is it OK if there's a (small) possibility for two different input vectors to produce the same shuffled output? (The generic method you describe does have that risk, although it should be no more likely than with a truly random shuffle, at least assuming that your hash, PRG and shuffling algorithm behave like their ideal counterparts.)
raugfer avatar
cn flag
@IlmariKaronen No need to be able to recover the initial configuration. And no problem if the same shuffling applies to two inputs, it could certainly happen if the vector has very few elements, but in general I assume it should happen with as low as possible probability. The important part is that it should be practically impossible to figure out the initial configuration for a large enough vector, similar to the properties of hashing.
Score:4
ng flag

What about sorting the vector? It's then demonstrably impossible to figure out anything about the original order.

If for some reason an ordered vector is undesirable, we can sort the vector according to the hash of each vector value. The order will look random for a party without knowledge of the hash, and it's still demonstrably impossible to figure out anything about the original order.

Madhav Malhotra avatar
au flag
What do you call this algorithm? How does it compare (in security and computational cost) to an algorithm that computes the hash value of the entire array's byte representation and uses that to implement the [Fisher-Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)?
fgrieu avatar
ng flag
@MadhavMalhotra: The first algorithm is a "sort". The second would be a "sort per hash of each entry". An algorithm that computes the hash value of the entire array's byte representation and uses that to implement the Fisher-Yates shuffle would be insecure (unless the hash is keyed by a secret key), because an adversary can test a guess of the initial order, by performing the algorithm and checking it's output.
Madhav Malhotra avatar
au flag
RE: "Unless the hash is keyed by a secret key" - so as an example, you could take the byte representation of an array, generate a SHA-256 hash, run the SHA-256 hash's byte representation through AES, and then use the output of AES encryption seed a random number generator for Fisher-Yates shuffle? If this is secure, it seems more computationally-efficient than hashing each array value for long arrays?
fgrieu avatar
ng flag
@MadhavMalhotra: yes that is secure against a bounded adversary, and computationally more efficient for short vector elements. And better than the second suggestion in the answer, with a keyed hash.
Score:2
ar flag

I think the generic solution you describe (hash the input vector, seed an RNG with the hash, use the RNG to shuffle the array) is basically the best you can do.

Assuming I've understood your requirements correctly, the idealized version of your shuffling algorithm would be a random oracle defined to take as its input a vector $v$ and to return a uniformly chosen random element of the set of all permutations of $v$.

Now, the defining property of a random oracle is that it returns a random result for newly queried inputs, but will always return the same result if queried with the same input again. To implement something that behaves like that in practice, we need to make sure that all the "random" choices made by the oracle are actually determined based on the input (and possible some constant key permanently associated with a particular instance of the oracle).

In particular, an algorithm such as the Fisher–Yates shuffle can be used to deterministically shuffle an $n$-element vector into a uniformly random permutation of itself, given a random integer $0 ≤ r < n!$ (or, equivalently, an $n$-tuple of random integers $(r_1, \dots, r_n)$ where $0 ≤ r_i < i$ for each $i$) as an additional input. Meanwhile, rejection sampling can be used to obtain (with probability exponentially approaching 1 over time) a uniformly distributed integer $r$ (or each of the $r_i$ separately, if you prefer) from a stream of random bits of unbounded length. And a CSPRNG can be used to turn a random seed of finite length an unbounded bitstream that, while not truly random, is (presumed to be) computationally indistinguishable from a truly random bitstream. And, finally, a key derivation function (typically constructed based on some cryptographic hash function) can be used to convert an arbitrary input bitstring, such as a canonical binary encoding of the input vector $v$, into a pseudorandom seed suitable for initializing the CSPRNG.

Of course, you can also choose to combine some of these steps, e.g. by feeding the input vector (canonically encoded into a bitstring) into a cryptographic sponge such as SHAKE128 and then squeezing the arbitrary-length bitstream directly out of it, but the general principle remains the same: generate a pseudorandom bitstream determined by the input vector and then use it to deterministically shuffle the same vector. And since the shuffling part can be done perfectly (the Fisher–Yates shuffle algorithm implements a one-to-one map from an $n!$ element set of integers (or tuples of integers) to permutations of $n$ element vectors), the only part where we need to rely on cryptographic assumptions about computational indistinguishability is generating the pseudorandom input to the shuffle.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.