TL;DR: use the Fisher-Yates shuffle based on a seed of the range represented as a list; the hash value is the seed.
First of all, MD5 generates 16 bytes that are generally well distributed. It doesn't generate hexadecimals; those are just used to represent the bytes in a textual form.
You can generate values in a range using the Fisher-Yates shuffle. However, this requires you first to draw an number in the range [0..n)
, then [0..n - 1)
until [0..2)
. Generally getting random numbers from bits is tricky because most algorithms use a non-deterministic number of bits.
In this case you can use a factorial of n
, then choose a vector using that to compute the required number in a range.
So you'll get something like the following Python code, which implements the shuffle based on a 128 bit seed:
from decimal import Decimal, getcontext
from hashlib import md5
import math
# Setting the precision for decimal operations
getcontext().prec = 50
def fisher_yates_shuffle(n, seed):
# Guard clause: Check the type and size of the seed
if not isinstance(seed, int) or seed.bit_length() != 128:
raise ValueError("Seed must be a 128-bit integer.")
# Initialize the array from 0 to n-1
arr = list(range(n))
# Convert the seed to a decimal
random_decimal = Decimal(seed) / Decimal(2**128)
# Calculate the product x of all numbers in [0, 1, ..., n-1]
x = math.factorial(n)
# Pick a starting point within [0, x)
current_vector = random_decimal * Decimal(x)
for i in range(n - 1, 0, -1):
# Generate a value for i-th index using current_vector
i_value = int(current_vector % Decimal(i + 1))
# Perform the shuffle step
arr[i], arr[i_value] = arr[i_value], arr[i]
# Update the current_vector for the next iteration
current_vector //= Decimal(i + 1)
return arr
# Generate a 128-bit random seed from MD5 hash of "Hello World"
seed_str = "Hello World"
md5_hash = md5(seed_str.encode('utf-8')).digest()
seed = int.from_bytes(md5_hash, byteorder='big')
# Number of elements in the array (max)
n = 16
# Perform the Fisher-Yates shuffle
result = fisher_yates_shuffle(n, seed)
print("Shuffled array:", result)
Written with the help of ChatGPT, but altered & logically validated.
Note that some bias is introduced, somewhere around $\frac{1}{2^{88}}$ for 16 values. Usually that's considered negligible, but it pays to be careful when it comes to cryptography.