Score:2

Why is there a slight mismatch most of the time in X25519 private keys depending on functions used, but public keys always match (same seed for both)?

cn flag
mkl

I'm trying to wrap my head around going from a seed to a SigningKey, and also obtaining a PrivateKey (encryption key). I'm using NaCl / libsodium.

I created the code below and the results are interesting. Turns out pk1.private_key and pk2.private_key match roughly 3% of the time. However public key matches 100%, all are generated starting with the same seed. What is going on here?

  • I obtain pk1 by using PrivateKey.from_seed(seed)

  • I obtain pk2 by using SigningKey(seed).to_curve25519_private_key()

Examples of mismatch (they're close but aren't equal):

# 1st byte mismatch by 0x01
seed:  f8d9e54a23971beebf2552c1a50ade6150cd051321398394f515e8d4b1ba0404
priv1: c1fd4612ee8ef24d295210a277e196e6bb4a9ae6b93f98d93f197860fe5dc048
priv2: c0fd4612ee8ef24d295210a277e196e6bb4a9ae6b93f98d93f197860fe5dc048

# 1st byte mismatch by 0x03
seed:  d612a66f92ee2f42ab1f7ea9a712a47c815843d21fc988b1d202459f235b6410
priv1: f33d5e80bb556333e2961c9868b1dc7e548836ee56808689ca022f1a19fe86bb
priv2: f03d5e80bb556333e2961c9868b1dc7e548836ee56808689ca022f1a19fe867b

# 1st byte mismatch by 0x01, last byte mismatch by 0x40
seed:  10b7e1c66cf08005a22289158a088e028160f892dc6c20d43025be4690aaed85
priv1: 194898f65d117579d50e80a9b7e07bd048bfd1300d55561dac9dfaed4ef02109
priv2: 184898f65d117579d50e80a9b7e07bd048bfd1300d55561dac9dfaed4ef02149
from nacl.signing import SigningKey
from nacl.public import PrivateKey, PublicKey, Box, SealedBox
from nacl.bindings import crypto_sign_SEEDBYTES
from nacl.utils import StringFixer, random

def run(debug=False):
    seed = random(crypto_sign_SEEDBYTES)
    pk1 = PrivateKey.from_seed(seed)
    pk2 = SigningKey(seed).to_curve25519_private_key()
    if debug:
        print('seed:  ', seed.hex()) 
        print('priv1: ', pk1._private_key.hex())
        print('priv2: ', pk2._private_key.hex())
        print('pub1:  ', bytes(pk1.public_key).hex())
        print('pub2:  ', bytes(pk2.public_key).hex())
    return seed, pk1, pk2

runs = 10000
private_key_match = 0
public_key_match = 0
both_match = 0

for i in range(runs):
    if i % 500 == 0:
        print(i, 'of', runs)
    seed, pk1, pk2 = run()
    x = pk1._private_key == pk2._private_key
    y = bytes(pk1.public_key) == bytes(pk2.public_key)
    if x:
        private_key_match += 1
    if y:
        public_key_match += 1
    if x and y:
        both_match += 1

print('private key match:', private_key_match)
print('public key match: ', public_key_match)
print('both match:       ', both_match)
Maarten Bodewes avatar
in flag
The private key should have a few bits set to specific values, if they have already been set by chance then you get what you get.
cn flag
mkl
@MaartenBodewes, I'm not sure I follow, does the private key not represent an integer number used in the computation of a public key as well as decryption. Why would several bits be inconsequential? Is there a spec I'm missing?
Maarten Bodewes avatar
in flag
Ugh, I'd have to look up the equations for that, however I wonder if the difference isn't when the bit masks are applied. I was hoping for some help from my fellow cryptographers here.
Score:2
cn flag

'curve25519' (as Bernstein originally called it, now renamed X25519 or XDH25519 for clarity because the same curve in different form is used for Ed25519[ph]) requires the privatekey (multiplier) to be a multiple of 8 to avoid small-subgroup attacks, and conventionally is represented in little-endian so the low 3 bits of the first byte are forced to 0.

X25519 also requires the high bit be 0 (because its underlying field is only about $2^{255}$) and Bernstein specifies the next-to-high bit be 1 (to block an 'optimization' that would allow timing attacks); these affect the last byte in little-endian form, which you apparently didn't notice in your examples also change bb - 7b and 09 - 49.

These two (or three) clampings taken together will change a randomly-chosen value all but (100/32)% of the time, which is close enough to 3% for government work.

Ed25519 actually has the same need for its private multiplier to be a multiple of 8, but it doesn't use the privatekey directly for the multiplier; instead it runs the privatekey through a hash to expand and use for both the multiplier and a secret value that blinds the message, and internally clamps only the multiplier part, without modifying what the user sees as the private key.

Dupe Why are the lower 3 bits of curve25519/ed25519 secret keys cleared during creation?
and Curve25519 key structure
and see/compare https://datatracker.ietf.org/doc/html/rfc7748#page-8
vs https://datatracker.ietf.org/doc/html/rfc8032#section-5.1.5 .

knaccc avatar
es flag
Note that there is not necessarily a need for private keys on either variant of the 25519 curve to be a multiple of 8. Certain signature or DH implementations may do this, but that is only one way to prevent a small subgroup attack. You can also prevent the attack by multiplying the EC point by the group order, and checking that the result is the point at infinity. This will also verify that the EC point is in the correct larger subgroup, which prevents other types of attacks such as a linkable ring signature key image uniqueness attack.
cn flag
mkl
@dave_thompson_085, that's super helpful thank you. So then it looks like one of the functions is implemented incompletely? Even though either private key produces the same public key (I presume so some check / clean up happens on pub key generation)
dave_thompson_085 avatar
cn flag
@miketery Nothing is incomplete. As I said, the schemes are defined differently; X25519 requires the clamping on the private key that is seen externally, while Ed25519 applies it internally so the private key that is seen isn't clamped but the elliptic-curve multiplication internally is.
Score:2
cn flag

Mathematically, a Curve25519 private key is an element of $\mathbb{F}_{2^{255}-19}$, i.e. an integer modulo $2^{255}-19$. This can naturally be represented as an integer between $0$ and $2^{255}-19-1$, which itself can be represented using 255 bits. Since practical computers work with 8-bit bytes, the smallest practical representation is as a 32-byte string.

There is a standard format for representing Curve25519 private keys, which is described in RFC 7748 §5. This format encodes the integer between $0$ and $2^{255}-19-1$ as a 32-byte (256-bit) string in little-endian order.

A 32-byte string can represent numbers in the range $[0,2^{256}-1]$, which is larger than $[0,2^{255}-19-1]$. Also, for various reasons, not all numbers in the range $[0,2^{255}-19-1]$ are good (see Curve25519 key structure and “May the Fourth Be With You: A Microarchitectural Side Channel Attack on Several Real-World Applications of Curve25519” by Genkin et al). The constraints are:

  • The number must be between $2^{254}$ and $2^{255}-19$, so the most significant two bits of the 256-bit number must be 0 and 1.
  • The number must be a multiple of 8, so the least significant three bits must be 0.

RFC 7748 §5 specifies that when generating a private key, one must take a random (or pseudorandom, as the case may be) 32-byte string and force the five bits mentioned above to their mandatory value. An implementation of Curve25519 that takes a private key as input in the form of a 32-byte string must apply this masking before it starts doing computations on the number that the key represents.

nacl.public.PrivateKey.from_seed returns the raw 32-byte string. nacl.signing.SigningKey(seed) performs the masking, so to_curve25519_private_key exports the value in its canonical, masked form.

The masking concerns 5 bits, so given a random seed, there is a $1/2^5 = 1/32 \approx 3\%$ chance that it already has the correct value for those 5 bits.

cn flag
mkl
Excellent answer, thank you!
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.