Score:0

RFC: Approach to CSPRNG

zw flag
Amo

I've been experimenting in python with different approaches to cryptographically secure pseudo random number generators, comparing them using the NIST testsuite implemented by https://github.com/InsaneMonster/NistRng/tree/master. My requirements are:

  • must be cryptographically secure, so its output is suitable as encryption key
  • only a relatively low number of outputs need to be produced (possibly up to 10.000)
  • the outputs must be reproduceable - past and future, given the number of steps
  • it should be as secure as conceivable (I don't care about 'overkill')
  • performance doesn't matter
  • I know and actively ignore the "don't brew your own solution" advice.

The solution I came up with is following:

def gen_key_at_given_step_of_mersenne(
password: str, password_file: bytes, step: int
) -> bytes:
"""Generate a key at a given step via mersenne twister, given a password and a password file
as seed, output a sha512 as key.

Args:
    password (str): Classical password string.
    password_file (bytes): Random bytes from a password file.
    step (int): The step into the mersenne twister sequence.

Returns:
    bytes: sha512 as key.
"""
    SEED = xor(password.encode("utf-8"), password_file)
    random.seed(SEED)
    randbits = random.getrandbits(256)
    for _ in range(step):
        randbits = random.getrandbits(256)
    return sha3_512(randbits.to_bytes(64, "big") + SEED + step.to_bytes(32, "big"))

My idea was that hashing the output of the mersenne twister using the step as salt together with the secret seed should defeat rainbow tables while it also should make attacks against the mersenne twister itself impossible, so predicting past and future outputs should be impossible.

Running the NIST test suite on a 1022025 concatenated array (2000 steps into the twister) gives me a comparable result as running it against np.fromiter((secrets.randbits(1) for _ in range(1017600)), dtype=int), indicating that it should be cryptographically secure.

Is there any obvious problem with this approach? Any way to prove that it indeed works as designed?

poncho avatar
my flag
"Running the NIST test suite ... indicat[es] that it should be cryptographically secure" - nope, the NIST test suite does not test for that (nor can any other automated test).
Maarten Bodewes avatar
in flag
In addition to the answer(s) given: using a random number generator for *deriving* keys is very dangerous. Even superficial changes (reversing the bytes in an internal word with random bits) will result in a different key. There have been problems with the Java "SHA1PRNG" having multiple slightly different (and actually completely different on Android) implementations, and using it would result in a different key during encryption / decryption.
Score:3
np flag

In general, I want to encourage experimentation like this, as long as you aren't using it in a production environment. Everyone needs to start somewhere, and playing with these sorts of toy constructs is a great way to learn. I don't want to discourage that.

My biggest problem with your approach is that it doesn't appear to be well defined.

Your implementation appears to depend on Python's pseudo-random number generator, but relying on the platform implementation of pseudo random number generator (especially a non-cryptographic one) is generally discouraged because it might not have enough internal state to be secure and its implementation might change. It also makes your algorithm dependent on the platform.

Your mechanism takes both a password and random data (Salt?) as input, but it is not described as a key stretching or key derivation mechanism. It also only produces pseudorandom strings of only a certain length. What if you need something larger than 512 bits, for, say, calculating an RSA key?

Another potential problem It also combines the password and the salt in a way that could allow, under certain adversarial conditions, the entropy of your system to be obliterated (for example, passing the same value for both password and password_file would cause the password to be effectively cancelled out and the generator would return the same result regardless of the value of the password).

As mentioned in an earlier comment, you cannot rely on the performance of any random number test suite to determine suitableness for cryptographic purposes. That's not what they are for. They are there to help identify gross problems and biases in generators.

I would recommend that you study the design of HKDF, since it is an example of a well-defined and secure pseudo-random key derivation mechanism.

If you really want to use human-memorable passwords, then you should also use a proper key-stretching algorithm (like Argon2) instead of HKDF_extract (although you should absolutely still use HKDF_expand).

Amo avatar
zw flag
Amo
Hi and thanks for the helpful comments and answer! https://github.com/python/cpython/blob/3.11/Lib/random.py doesn't appear to be platform-dependent, so I don't see a problem on that front. The random file would be a simple `secrets.randbits(256)`, cryptographically strong os-dependent bits. The password is just to have 2 factors, but it doesn't depend on password strength solely, and there is no chance to have password == password_file. I will definitely look into Argon2, thanks for the suggestion. Sorry for the edits, first time..
darco avatar
np flag
It is platform-dependent in the sense that Python is a platform. That library is defined by python, and the python authors might change it one day, similar to how `arc4random()` doesn't use RC4 anymore. Ideally, if you are defining an cryptographic algorithm, it should be defined purely independently of a given platform. Sure, your implementation can use platform implementations, but your documentation and algorithm definition should spell out every detail.
darco avatar
np flag
Also, if you wanted to use a PRNG based on a mersenne twister (not recommended, btw), then I would only rely on a PRNG that has the dedicated purpose of implementing the mersenne twister (or implement it yourself), for example.
Amo avatar
zw flag
Amo
Okay, makes sense. In this case, it would be prudent to look at the code as if it was pseudo-code and ignore the language specific bits.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.