Score:4

Optimize the speed of a safe prime finder in C

co flag

I am trying to implement the Schnorr’s identification protocol in C. I need a safe prime in order to be able to find a generator of the cyclic group efficiently. The problem is that my program takes too much time to find the safe prime. I am using Libsodium for generating random numbers and GMP for arbitrary precision arithmetic:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include <sodium.h>
#include <gmp.h>

void bytes_to_mpz(mpz_t result, const unsigned char* array, size_t size) {
    mpz_import(result, size, 1, sizeof(unsigned char), 0, 0, array);
}

/**
 * Computes a cryptographycally secure random number.
 *
 * @param p Sets this to the random value.
 * @param bits The lenght of the random value in bits.
 */
void get_random(mpz_t value, size_t bits) {
    // Generate random bytes
    unsigned char random_bytes[bits/8 + 1];
    randombytes_buf(random_bytes, sizeof(random_bytes));

    // Convert the unsigned char array (bytes) to an mpz_t variable
    bytes_to_mpz(value, random_bytes, sizeof(random_bytes));

    // Obtain only the desired number of bits
    mpz_t aux;
    mpz_init(aux);
    mpz_set_ui(aux, 2); // aux = 2
    mpz_pow_ui(aux, aux, bits); // aux = 2^b
    mpz_sub_ui(aux, aux, 1); // aux = 2^b - 1
    mpz_and(value, value, aux); // value = value AND aux

    // Clear the aux variable
    mpz_clear(aux);
}

/**
 * Computes a random prime p.
 *
 * @param p Sets this to the prime.
 * @param bits The lenght of prime in bits.
 */
void get_random_prime(mpz_t p, size_t bits) {
    int is_prime = 0;
    // Generate random values until one of them is a prime number
    while(!is_prime) {
        get_random(p, bits);
        is_prime = mpz_probab_prime_p(p, 40);
    }
}

/**
 * Computes a random safe prime p and its Sophie Germain prime q.
 *
 * @param p Sets this to the safe prime.
 * @param q Sets this to the Sophie Germain prime.
 * @param bits The lenght of the safe prime in bits.
 */
void get_random_safe_prime(mpz_t p, mpz_t q, size_t bits) {
    int is_safe_prime = 0;
    while(!is_safe_prime) {
        // q lenght is k-1 so 2q+1 length is k
        get_random_prime(q, bits - 1);
        mpz_mul_ui(p, q, 2); // p = 2q
        mpz_add_ui(p, p, 1); // p = 2q + 1
        is_safe_prime = mpz_probab_prime_p(p, 50);
    }
}

int main() {
    // Initialize libsodium
    if (sodium_init() < 0) {
        return 1; // Initialization failed, handle the error
    }

    mpz_t p, q;
    mpz_inits(p, q, NULL);

    get_random_safe_prime(p, q, 2048);

    printf("Safe prime: ");
    mpz_out_str(stdout, 10, p);
    printf("\n");

    printf("Sophie Germain prime: ");
    mpz_out_str(stdout, 10, q);
    printf("\n");

    return 0;
}

Maybe the random number generator is generating a bottleneck or I could use multiple threads to improve the speed?

Paul Uszak avatar
cn flag
If you scrap the CSPRNG and go with something like an Xoshiro PRNG, you'll see a 10x throughput in randomness generation.
poncho avatar
my flag
"I need a safe prime in order to be able to find a generator of the cyclic group efficiently."; actually, there are more efficient ways than using a safe prime. For example: find a 256 bit prime $q$, find a 2048 bit prime $p$ of the form $p = kq+1$ for some integer $k$, pick an arbitrary value $h$ and compute $g = h^{(p-1)/q} \bmod p$; if $g \ne 1$, then $g$ is a generator of the cycle group of order $q$ (which is what you asked for). And, these steps are a lot cheaper than searching for a safe prime (even with my suggested optimizations)
poncho avatar
my flag
Or, an even easier way: go to https://datatracker.ietf.org/doc/rfc3526/ and use the safe prime listed there...
Prankster2k avatar
co flag
@poncho Is it a good practice to reutilize the public parameters?
poncho avatar
my flag
@Prankster2k: sure it is - doing Schnorr in a group doesn't result in any weakness against that group.
Prankster2k avatar
co flag
@poncho Do you know if `openssl dhparam` uses safe primes? I am just curious about this because I have studied many cryptosystems but I do not really have knowledge about how to generate the parameters correctly. If you know any source or book to learn more about this topic I would be very grateful.
dave_thompson_085 avatar
cn flag
By default `openssl dhparam` does do 'safe' prime (2q+1) and is slow(er) but if you add `-dsaparam` it does Schnorr (kq+1) with q sizes chosen (but not always well IME) from those in DSA i.e. FIPS186; see nearby https://security.stackexchange.com/questions/95178/diffie-hellman-parameters-still-calculating-after-24-hours (@poncho) If you use `openssl genpkey -genparam` in 3.0.0 up it has more options; see the manpage.
cn flag
You should take a look at the preprint [Safe Prime Generation with a Combined Sieve](https://eprint.iacr.org/2003/186) by Michael J. Wiener.
Score:6
my flag

The first obvious thing to do is quickly reject those values where either $q$ or $2q+1$ are obviously not prime.

For example, suppose you code sends the time to pick a prime $q$, where $q \equiv 1 \pmod 3$ (which happens about half the time). You spent all that time, but when you compute $2q+1$, well, $2q+1 \equiv 0 \pmod 3$, and so you just wasted the time you spent proving that $q$ was prime.

The obvious thing to do is, when you pick a random value, before you test it for primality, you check the values $q$ and $2q+1$ for small factors.

That is, for each of the small primes $r$, you compute z = q % r and reject it if $z = 0$ or $z = (r-1)/2$. That should save a considerable amount of time.

Of course, you'd need to select a range of small primes; all the primes < 1000 might be a good start.

Other ideas:

  • When you pick a random value, make sure it's odd by setting the lower bit; even values are hardly ever prime

  • When you initially test $q$ for primality, use an iteration count of 1; if that passes, then test $2q+1$. If that passes, then go back and retest $q$ with an iteration count of 39 (which will almost certainly pass). Here's why: mpz_prob_prime will hardly ever fail on random values (even with an iteration count of 1); however, when you're doing the initial test of $q$, you'll fail later on quite a bit (because $2q+1$ will turn out to be composite); hence it makes sense to do a quite initial test (which is almost always accurate), and then follow up with a more stringent test.

  • As for the primality test of $2q+1$, well, if $q$ is also prime, it turns out to be a lot easier. If we just compute $g^q \bmod 2q+1$ (for any $1 < g < 2q$) and if it's either 1 or $2q$, then $2q+1$ is prime (!). If you don't feel comfortable with doing that explicitly, just call mpz_prob_prime with an iteration count of 1, and it'll essentially do that for you. Note that if you do follow the previous suggestion, you really ought to follow it up with a retest of $q$ (because this suggestion depends on the primality of $q$)

  • It turns out doing a linear scan for safe primes is rather more efficient than a random scan (which you are doing). If you feel more comfortable with a random scan, that's fine - however, with a linear scan, you can be a lot more aggressive in eliminating $q$ and $2q+1$ values with small factors (that is, eliminating a larger range of small factors, which reduces the number of times you call mpz_prob_prime with a composite)

Score:4
ng flag

Preliminary note: in practice it's often used one of the safe primes given in RFC 3526. Or/and it's often wanted some additional criteria (e.g. $2^{(p-1)/2}\bmod p=p-1$, some number of high or/and low bits set in $p$). We do not consider that. We strive to generate a random safe prime (and matching Sophie Germain prime) of specified size bits.

Local improvements

The question's code wastes most of it's time testing if $q$ is prime to a high certainty, only to later find that most of the time (>99.8% for 2048-bit safe primes) $p=2q+1$ is composite.

Here's what I once used, which differs by a preliminary check that $p$ could be a safe prime (step 1 below), and is probably prime (step 2).

  1. All safe primes $p>7$ verify $p\bmod12=11$. Thus we restrict to candidate $p$ with $p\bmod12=11$, or equivalently to candidate $q$ with $q\bmod6=5$. That test can be inserted after get_random(p, bits) which actually generates $q$. This alone will more than half the time spent in mpz_probab_prime_p.

    Better, we can replace that screening of a candidate $q$ by the computation $q\gets q-(q+1\bmod 6)$ and a check that the possibly reduced $q$ remains of bit size bits-1. That reduces the calls to get_random by a factor of $6$, for the same screening.

    We can screen more thoroughly by using that $p\bmod s>1$ for some small primes $s$ with $5\le s<p/2$. Better, we can use a sieve. As noted in comments, that's a huge saving. I expand on that in the second section of this answer.

  2. Verify that $2^p\bmod p=2$. That's a Fermat test to base $2$, which will quickly rule out overwhelmingly most composite $p$. It is a huge saving because it eliminates so much prime testing effort in step 3 than only gets invalidated at step 4.

  3. Only then check that $q=(p-1)/2$ is prime. The question's code does that with mpz_probab_prime_p(p, 40). That's fine, but 40 is overkill, especially for large bits if we trust our random generator.

  4. Then check that $p$ is indeed prime. The question's code does that with mpz_probab_prime_p(p, 50). Again, that's fine but 50 is overkill. Thanks to step 2, that test will almost always succeed, instead of failing with very high probability.

Here is illustrative code on that tune. Beside the above optimizations, some changes compared to the original:

  • The safe prime generated by get_random_safe_prime has exactly (rather than at most) the specified size bits.
  • The entropy source is /dev/urandom, rather than libsodium.
  • Runtime errors are handled by exit(1).
  • The only functions get_random_safe_prime uses are in libraries.

Generation of a 2048-bit safe prime is in the order of a minute, with a geometric distribution, thus that varies wildly.

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>

/**
 * Finds a safe prime p uniformly at random among those of specified size bits,
 * and its Sophie Germain prime q. Uses /dev/urandom as the source of entropy.
 * Calls exit(1) on error.
 *
 * @param p     Sets this (already initialized) to the safe prime.
 * @param q     Sets this (already initialized) to the Sophie Germain prime.
 * @param bits  The length of the safe prime in bits. Must be at least 3.
 */
void get_random_safe_prime(mpz_t p, mpz_t q, size_t bits) {
    FILE *f;                         // file for random source
    if (bits < 3 || !(f = fopen("/dev/urandom", "rb"))) exit(1);
    unsigned mask = 1<<((bits+7)&7); // mask for upper bit
    size_t nbytes = (bits+7)>>3;     // number of bytes
    uint8_t buf[nbytes];             // buffer of random bytes
    mpz_t two;                       // the constant 2
    mpz_init_set_ui(two, 2);
 
    for(;;) {
        // get nbytes of random
        if (fread(buf, 1, nbytes, f) != nbytes) exit(1);
        buf[0] = (buf[0] & (mask-1)) | mask;    // ensure buf has size exactly bits
        // Convert buf to an mpz_t variable. Third parameter order=1 means big-endian.
        mpz_import(p, nbytes, 1, 1, 0, 0, buf); //  most significant word first
        // here p is a random integer of size exactly bits
        if (bits > 4) { // that is p > 15
            mpz_sub_ui(p, p, ((unsigned)mpz_fdiv_ui(p, 12)+1u)%12u);
            // here p mod 12 = 11
            if (bits > 9) {  // that is p > 511
                // test p modulo small primes s in [5,73], eliminating >87% of p
                unsigned long r;
                // first pass includes s = 37 instead of s = 29
                r = mpz_fdiv_ui(p,  5ul* 7ul*11ul*13ul*17ul*19ul*23ul*37ul);
                if ((r% 5u)>>1==0||(r% 7u)>>1==0||(r%11u)>>1==0||(r%13u)>>1==0||
                    (r%17u)>>1==0||(r%19u)>>1==0||(r%23u)>>1==0||(r%37u)>>1==0)
                    continue; // p can't be a safe prime'
                // includes s = 29 rather than s = 37 so that the modulus fits 32 bits
                r = mpz_fdiv_ui(p, 29ul*31ul*41ul*43ul*47ul*53ul);
                if ((r%29u)>>1==0||(r%31u)>>1==0||(r%41u)>>1==0||(r%43u)>>1==0||
                    (r%47u)>>1==0||(r%53u)>>1==0)
                    continue; // p can't be a safe prime
                r = mpz_fdiv_ui(p, 59ul*61ul*67ul*71ul*73ul);
                if ((r%59u)>>1==0||(r%61u)>>1==0||(r%67u)>>1==0||(r%71u)>>1==0||
                    (r%73u)>>1==0)
                    continue; // p can't be a safe prime
            }
            mpz_powm(q, two, p, p); // Fermat test to base 2 for p, result in q
            if (mpz_cmp(q,two)!=0)
                continue; // p failed a Fermat prime test to base 2, thus is not prime
        }
        mpz_fdiv_q_2exp(q,p,1); // q = (p-1)/2
        // final acceptance test
        if (mpz_tstbit(p, bits-1) && mpz_probab_prime_p(q, 40) && mpz_probab_prime_p(p, 50))
            break; // sucess
    }
    mpz_clear(two);
    fclose(f);
}

// demo of get_random_safe_prime
int main() {
    mpz_t p, q;
    mpz_inits(p, q, NULL);

    get_random_safe_prime(p, q, 2048);

    printf("Safe prime: ");
    mpz_out_str(stdout, 10, p);
    printf("\n");

    printf("Sophie Germain prime: ");
    mpz_out_str(stdout, 10, q);
    printf("\n");

    return 0;
}

Try it online! Note bits is preset to 512 rather than 2048 in order to avoid overtaxing that marvelous free resource.


Dual sieving

For the fastest possible generation (and huge reduction of the input entropy needed), we want to integrate into a dual sieve the generation of candidates $p$ or $q$, and the extended version of the pre-test of 1 checking that $p\bmod s>1$ for small primes $s$. That's as in the Sieve or Eratosthenes, but for each small prime $s$ we rule out the candidates $p$ in the sieve that are one more than a multiple of $s$, in addition to those that are a multiple of $s$ as in the classic/single sieve.

A reasonable width of the sieve is an interval of about $b^2$ candidates $p$ of $b$ bits, which by some heuristic is expected to contain $\approx4.1$ safe primes on average. Using one sieve bit for each $p$ with $p\bmod12=11$, such sieve uses $\lceil b^2/96\rceil$ bytes.

One theoretical issue with that basic dual sieve is that the safe primes generated are not uniformly distributed among safe primes of size $b$: if we sieve by increasing $p$, the probability that a safe prime $p$ gets selected is proportional to it's distance to the next lower safe prime (with some exception for the smallest safe prime of size $b$), and that distance varies a lot. That issue is mitigated to a degree if we widen the sieve and randomize the order used to test (per 2/3/4) the candidates that survived sieving. For small $b$, the sieve can cover the whole interval of interest, and the mitigation is perfect.

For large $b$, another (alternate or additional) mitigation strategy is to select some medium random secret prime $t$ other than those $s$ it's sieved with, then some random secret $t'\in[2,t)$, and restrict sieving to candidates $p$ with $p\bmod t=t'$. This widens the sieving interval, but increases neither the RAM requirement for the sieve nor the sieving effort. This works because heuristically, $p\bmod t=t'$ performs a random-like selection among the safe primes $p$ in the overall interval of interest. Should we find no safe prime in the sieved interval, it's best from a uniformity standpoint to pick another $t'$, or even another $t$ and $t'$.

poncho avatar
my flag
Actually, I believe that doing a sieve for small primes would considerably increase performance (testing $p \bmod r$ for small $r$ is cheap; far cheaper than computing $2^p \bmod p$, and immediately rejects $2/r$ values as composite)
fgrieu avatar
ng flag
@poncho: True, that's a further and large _relative_ improvement in time compared to what I propose. But my steps 1/2 are enough to save largely most of the _absolute_ time that can be saved compared to what the question does.
poncho avatar
my flag
Actually, I don't believe that step 2 is that much of an optimization; I believe mpz_prob_prime does Miller-Rabin, and MR is essentially a Fermat test with a few more (cheap) intermediate tests. The major performance delta would be computing $2^{p-1} \bmod p$ vs $x^{p-1} \bmod p$ (for random $x$); depending on the details of the modexp implementation, this delta may be quite modest.
fgrieu avatar
ng flag
@poncho: yes `mpz_probab_prime_p` rules out composites roughly as fast with my step 2 can, and even faster on average as it first does trial division by small primes. But in the question it's applied to $q$ and with parameter 40, so when it hits a prime $q$, it makes Baillie-PSW and 40 Miller-Rabin tests on $q$, and with probability >99.8% the extra 39 tests are a waste of time because $p=2q+1$ turns out composite. Thus even if my step 2 was 15 times slower than `mpz_probab_prime_p` on average, it would still gain a factor of more than two overall.
cn flag
@poncho and fgrieu-onstrike: You can sieve for both primes at once, see [Safe Prime Generation with a Combined Sieve](https://ia.cr/2003/186). (Best followed by one or two Fermat tests to base 2)
Prankster2k avatar
co flag
Maybe a good idea is to ensure that the random generated $p$ is an odd number before doing any test, isn't it? Moreover you say that `mpz_probab_prime_p(p, 40)` and `mpz_probab_prime_p(p, 50)` are an overkill. Could you suggest better values for the parameters?
fgrieu avatar
ng flag
@Prankster2k: for `bits`over 4, $q$ and $p$ are turned odd by $q\gets q-(q+1\bmod6)$, equivalently $p\gets p-(p+1\bmod12)$, implemented as `mpz_sub_ui(p, p, ((unsigned)mpz_fdiv_r_ui(q, p, 12)+1u)%12u);`. Per [FIPS 186-5 table B.1](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-5.pdf#page=166) and it's appendix C, for a known-randomly-chosen integer of 2048 bits (or more) , 4 rounds of Miller-Rabin are enough. Since `mpz_probab_prime_p` is a superset of that, we can replace 40 and 50 by 4 _when `bits` is 2048 or more_. But this won't make things much faster on average.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.