Score:13

What can make an implementation of a large integer library unsafe for cryptography

vn flag

Unfortunately, I don't have any references, but I remember people mentioning that some large integer libraries can be unsafe to use for writing cryptographic algorithms such as the RSA.

That made me curious if anyone knows of any examples of vulnerabilities that would be introduced by an unsuitable implementation of a large integer library.

To me, it is somewhat difficult to imagine how that would work, because it seems to me that a library either produces the right arithmetic operations or it doesn't. It can maybe calculate slowly, but I don't see how that could cause any security issues.

Jim avatar
mt flag
Jim
I'm not a crypto expert so I'll let others answer you question. My concern, though, is that you seemed to phrase your question backwards. The important point to understand is what makes a library _safe_ for crypto, with the default assumption that it isn't. If you have a list of A, B, and C items that make a library unsafe, it doesn't mean that another library without these is safe.
Score:23
ng flag

Some large integer libraries can be unsafe to use for writing cryptographic algorithms

Yes. Leaving aside plain bugs (buffer overflows, incorrect result in edge cases), there is the issue of side channels:

  1. Timing dependent on secret data being processed, revealing some information about such secret data.
  2. Power consumption variations amenable to power analysis (differential or not)
  3. Cache-related information leaks
  4. Intermediary secret values left in the stack or heap that can be leveraged using bugs in the rest of the software.

Most large integer arithmetic libraries not designed for crypto do not even try to resist any of this. It requires attention to processor-dependent details that is antagonist to their goals: ease of use, performance, and code portability to different CPUs.

Often, the very design of the interface to a standard arithmetic library makes timing attack (1 above) next to unavoidable. For example, imagine we want to re-implement RSA decryption of a ciphertext $C$ with a 2048-bit key, per RSAES-PKCS1-v1_5 or RSAES-OAEP, on top of java.math.BigInteger. We will:

The natural way to turn $m$ into $\mathsf{EM}$ is with toByteArray(), which goes:

public byte[] toByteArray() {
    int byteLen = bitLength()/8 + 1;
    byte[] byteArray = new byte[byteLen];

    for (int i=byteLen-1, bytesCopied=4, nextInt=0, intIndex=0; i >= 0; i--) {
        if (bytesCopied == 4) {
            nextInt = getInt(intIndex++);
            bytesCopied = 1;
        } else {
            nextInt >>>= 8;
            bytesCopied++;
        }
        byteArray[i] = (byte)nextInt;
    }
    return byteArray;
}

Notice that $m$ is in range $[0,n)$, thus in range $[0,2^{2048})$. For $m$ in $[2^{2047},2^{2048})$ the loop is executed 257 times with 65 invocations of getInt, whereas it is executed at most 256 times with 64 invocations of getInt if the result is in $[0,2^{2047})$. An adversary in a position to time execution of the code can use that to get information about the high-order bit of $\mathsf{EM}$, in a situation where that's supposed to be secret!

Granted, that information is lost among a lot of noise, but with precise timing measurement (e.g. timing code running on the same physical machine) and by averaging the result of multiple queries with different $C$, that can be enough to decipher one arbitrary ciphertext. There has been many claims of attacks along this line, some at least dangerously close to practical.

It's essentially impossible to completely avoid this issue when using java.math.BigInteger, because the number of 32-bit words in the internal representation of an integer changes according to the value of the integer. Libraries intended for cryptographic use typically avoid this issue by having fixed rather than automatically-variable size of internal arrays and arguments. That's less easy to use, but makes constant-time code possible (if not easy, see this).

jp flag
And of course, Java doesn't even guarantee this particular implementation. It only guarantees the interface. Other implementations and even other versions of the same implementation may use different ones.
id flag
A point I didn't see explicitly mentioned is that libraries not intended for crypto not only "don't attempt to resist any of this", but many of them go out of their way to exploit situations where some calculations may be performed more quickly than others. For most purposes, given a choice between a calculation which takes 1ms for any input values, or one which would take 0.9ms for 25% of input values and 1ms for the rest, the latter would be purely *better*. Indeed, it may even be better for cryptographic purposes *in environments that aren't exposed to anything untrustworthy*, but...
id flag
...many real-world applications require that programs be run on machines that are also running untrustworthy programs, or have real-time connections to machines operated by untrustworthy people.
Peter Cordes avatar
us flag
*ease of use and code portability to different CPUs.* - another goal of non-hardened libraries is usually performance, including for libraries such as the widely used [GMP](https://gmplib.org/) which has hand-written asm for many architectures. An early-out branch might be a win in some cases, like if small numbers are common (so high limbs being zero is common). Or as you say for Java, a variable number of limbs according to MSB position.
Score:3
vi flag

As you note, there is not generally an issue with producing incorrect results that are specific to cryptography. Results will either be correct or not. Wrong results are bad for both cryptographic and non-cryptographic applications. Since they're used more, non-cryptographic libraries are probably more likely to produce correct result.

This answer already told you about side channel timing attacks. This is a serious concern, but I won't speak more to it. It also mentions but does not explore the possibility of data being left on the stack or heap.

If an attacker can make a program do a core dump and there are variables in it like $original_password. Then obviously that can leak the original password. For this reason, cryptography often overwrites these values when it's done using them. So that a memory dump will just get garbage or null bytes.

Now, a math library is unlikely to have something called $original_password in it. But it might have a function called hash that takes a byte array as an input. And that byte array may live on the stack or heap. And a cryptography library that uses the math library might have a call hash(to_byte_array($original_password)). From which you should be able to see that the $input byte array to hash would contain a deterministic transformation of the $original_password. So an attacker who knows that the only time that the program calls hash is to hash the original password just needs to reverse the transformation. Now the attacker has the original password.

If the attacker can cross reference the password with the username, then that pair is compromised.

Even if the hash function is called multiple times, that just makes things harder, not impossible. Now there are (e.g.) fifteen instances of $input on the stack. Transform all of them into strings. Now you have fifteen candidates for the password.

This kind of stuff is what people mean when they talk about writing specifically for cryptography. It's not that the mathematical calculations are better in any way. Actually, because of timing attacks, they are in some ways worse. Cryptography deliberately writes its algorithms to be constant time. And constant time in this application means that the algorithms always have to be as slow as the slowest possible path. So cryptographic math libraries are deliberately slower and more wasteful of memory than non-cryptographic libraries.

They also do things like pass in the original data without transforming it first. So the cryptographic hash function might take a string ($original_password) rather than an array of integers.

Cryptographic libraries also tend not to be general purpose. First, because why would you use a slow cryptographic library when you could use a faster non-cryptographic library? Second, because their signatures are different. For a math library, you would often pass the input itself on the stack. Because that's easier. But for a cryptographic library, you would rather pass a reference. Because you don't want the password to appear more than once in memory.

In fact, as soon as possible, you want it to appear zero times. So overwrite the original password input with its hash. Rather than returning the hash from the function, modify the original variable. That might be counterproductive if you are taking the hash for serialization or comparison purposes (general purpose uses of a hashing function). But for the purpose of cryptography, it makes sense. You want both the original password and the hash to spend no time on the stack and as little time in memory as possible. So replace the original password with the hash. Eventually overwrite the hash with garbage. Never make copies.

Also consider the handling of randomness. Is the random generator's seed a security hole? In a non-cryptographic library, no. Because you don't particularly care if the crashed program reveals what the seed had been. It's crashed. You won't be continuing it. But for cryptographic purposes, you care what the historical choices were. An attacker who knows the seed, has a much more limited set of possibilities for what the random choices were. The attacker can try all the permutations of that set and still save time relative to trying all possibilities.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.