Score:20

How can I understand whether my C implementation is constant-time or not (i.e. resistant to timing attacks)

jp flag

I have a code for polynomial multiplication and it is written in C. I heard that whether a particular instruction is "constant time" can vary by architecture and by processor model and there isn't any official documentation for this behavior. How can I understand if my code is constant time or not?

Note: By "Constant time" I mean a software or code piece that are resistant to timing attacks. I use UBUNTU on an i7 10th generation PC

kelalaka avatar
in flag
Compile and see. Different compilers behave differently and they differ in the target architecture, too. You can use the [Compiler Explorer](https://godbolt.org/) to see the ASM. In usual it is written in ASM. [Watch our Thomas Pornin's T1](https://www.youtube.com/watch?v=IHbtK5Kwt6A).
kelalaka avatar
in flag
Just concentrate on the key dependent parts. Note that your code can be reviwed in codereview.se
esra avatar
jp flag
@kelalaka does that mean I must see the assembly code to make sure if the code is constant time or not? Is it necessary to analyze it with assembly? can I do it on the C code on my PC?
esra avatar
jp flag
@kelalaka my code is not a full encryption protocol it just a polynomial multiplication code. There are no keys exc. Can I measure if my multiplication code solely is constant time or not?
kelalaka avatar
in flag
Yes, and the problem you must make sure that for every compilation. Compiler Explorer is great tool for this.
kelalaka avatar
in flag
The question in time attack is what you leak, if information about keys are not leaked why to have constant time?
esra avatar
jp flag
@kelalaka sorry my mistake I use a piece of a key in the multiplication. you are right
kelalaka avatar
in flag
[1](https://crypto.stackexchange.com/q/30958/18298), [2](https://crypto.stackexchange.com/q/82095/18298). [3](https://crypto.stackexchange.com/q/33252/18298), [4](https://crypto.stackexchange.com/q/80492/18298), [5](https://crypto.stackexchange.com/q/75408/18298) and many others... see tags [tag:timing-attack], [tag:side-channel-attack]
kelalaka avatar
in flag
A cross-site duplicate from infosec : [How can I prevent side-channel attacks against authentication?](https://security.stackexchange.com/q/220446/86735)
nr flag
I don't get why on earth anyone would want to even try to achieve constant-time code. As already mentioned by others, it's impossible to guarantee, even if you fix the assembly code, because what one CPU does today may not be what the next generation of that CPU does, and there is absolutely no way of predicting. Instead of trying to do something impossible, simply add a random delay that is an order of magnitude bigger than the time your code takes. Yes, it cannot prevent timing attacks altogether, but it is far better than wasting time and energy trying to make CPU execution times equal...
kelalaka avatar
in flag
@user21820 while the researchers exhibit the possibility of the [remote timing attacks](http://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf), there is a huge concern that timing attack can expose the secret keys. A simple explanation was [in this answer](https://crypto.stackexchange.com/q/75408/18298). We are not talking about securing everything we are talking about securing key depended parts. This is serious attack point from smart cards to shared cloud computers. This is cryptography where devil in the details and $$cryptgraphy \neq cryptocurrency$$
nr flag
@kelalaka: Wait, I was not responding to your comment, but the general idea that one can look at assembly code to try to achieve constant-time execution. And I don't see why you seem to assume that I don't know what timing attacks are. Moreover, even the comments on your own linked answer repeat the point I made here.
kelalaka avatar
in flag
@user21820 the whole point from my first comment, ASM. Keep in mind that, Thomas Pornin build a compiler that safe under some assumptions of course and now they are trying to extend it into Turing complete, though not necessary.
nr flag
@kelalaka: Eh?? Didn't I explicitly make it clear that assembly is **not enough** because the physical CPU design does **not** guarantee anything about execution time? It's pointless to keep talking about merely abstract models when security is about the **real world**.
kelalaka avatar
in flag
@user21820 it is not write once compile for every target. I never said that. One has to look all changes in the CPU to be sure that even the same series are secure.... Most of the attacks occurs in the real world. Of course there are academical attacks that envisions the possibility, however, real application may come way longer like padding oracle attack; 2003 to 2013. In this science sometimes attack occurs first, then the model and the other case. Don't we even model the real world to analyze?
nr flag
@kelalaka: If you're saying that you repeatedly conduct an end-to-end analysis for every single target CPU architecture on which you run your code, then yes I agree that works. But is all that effort really worth it? There are many more cost-efficient ways of preventing timing attacks than bothering about the assembly code.
marshal craft avatar
de flag
constant time means it runs in a time interval bound by a constant. So the amount of time to run the function with any possible input is always less than a constant. So like say the function always returns in like 20 seconds no matter what input, it runs constant time.
marshal craft avatar
de flag
Also for the abstract algorithm of polynomial mulitplication, it is not constant time. The time grows as add factors or input higher degree polynomials. You really haven't asked a clear question cause there are 2 cases. The function accepts 2 inputs which are polynomials of various degrees, basically the list of coefficients having the size of the degree of the polynomial input.
marshal craft avatar
de flag
In this simple case just accepting 2 inputs, the number of multiplications and additions grows as a function of the degree or number of coefficients. So it is not bound by a constant in general. However in reality, in your implementation, if there is a maximum bound on input, then very simply yes the implementation is constant time. Which would be the worst case for the maxmimum input. So like 2 max polynomial degrees, with max valued coefficients would represent the constant time. Only because your implementation utilizes finite resources, defiend by the computer architecture and software.
Score:30
ph flag

Unfortunately in the absence of documentation from the CPU vendor you can't be 100% sure what algorithms will or will not be constant time. That said, there are certainly rules of thumb that can be used to reduce the risk of problems.

  1. Anything that involves branching control flow or conditional execution of code is obviously a problem.
  2. Comparison operations may be a problem even if they are not directly used to affect control flow. The problem is that comparison operators usually store their results in a flags regsiters and the easiest way to get an individual result out of a flags register is often branching or conditional execution.
  3. Booleans can be a problem because a compiler may choose to replace an operation involving a boolean with conditional execution or branching.
  4. The timing for memory access may vary depending on the address that is accessed and the state of the cache. This means that code involving lookup tables has a high risk of not being constant time.
  5. Multiplication and division may be implemented using algorithms that take a variable number of cycles to complete, multiplication is generally safe on modern "applications" processors but may not be on micro controllers. I would not consider division safe on any processor.
  6. Bit shifts and rotations by a variable number of positions can be a problem on some CPUs as they may translate into loops (in either software or hardware).
  7. Addition, subtraction, bitwise operations and bit shifts by a constant are generally OK.

You might want to take a look at https://www.bearssl.org/constanttime.html which discusses these issues in more detail.

kelalaka avatar
in flag
Note that there is a great extensive answer by Squeamish Ossifrage on Infosec [How can I prevent side-channel attacks against authentication?](https://security.stackexchange.com/q/220446/86735). That covers all in much more details.
cn flag
On an 8-bit platform (smart card) I've seen that a C-compiler implemented incrementing a 16-bit number (i.e., adding 1) by the three commands (1) incrementing the lower byte, (2) conditional jump over next instruction if result is zero, and (3) incrementing the higher byte.
Score:12
in flag

Note that this excellent answer is belongs to Squeamish Ossifrage that they stopped contribution! I made a copy and paste then made community. Voting this answer doesn't produce anything to, me. If you can, vote them on Infosec.


From the code example provided, it is possible to determine the correct password by timing the code when given various inputs.

First, you shouldn't actually examine the password directly! At the very least, you should hash the password with a password hash like Argon2id first, and compare the password hash of the input with the password hash you stored during user registration (or when the user last changed their password).

Even better, you should use a password-authenticated key agreement protocol like OPAQUE, but these may be beyond your pay grade at the moment until they see more widespread adoption and implementation.

What can I do to ensure that my code is not vulnerable to such timing attacks?

The best way to start is to use a library routine or primitive that someone else has already written and has a reason to maintain. For example, in NaCl/libsodium, you can use crypto_verify_32 to compare two 32-byte strings, like two Argon2id hashes, or two HMAC-SHA256 message authentication codes. Then the effort to answer this question can be focused on a single place that will get a lot of attention and scrutiny and will keep up with advances.

But let's say you don't have crypto_verify_32, or you want to implement it yourself. What can you do?

To start, you need to understand what operations have side channels. It's tempting to say—as other answers did—that the side channel arises only because of an early abort. But that's not the whole story. In general, there are many operations (here written in C for illustration) that may take an amount of time which depends on the values of the inputs—we call these operations variable-time operations, in contrast to constant-time*:

  • for (i = 0; i < n; i++) if (x[i] == y[i]) return EFAIL; obviously takes fewer loop iterations so it is practically guaranteed to run in variable time depending on the secret values of x[i] and y[i].

  • A mere secret-dependent conditional for (i = 0; i < n; i++) if (x[i]) bad++;, if x[i] is secret, may run in variable time too even if the loop doesn't abort early. Why?

    • Here's a coarse approximation. The machine instructions that the CPU may execute look something like this:

      0:      tmp := x[i]
              branch to 1 if tmp is zero
              bad := bad + 1
      1:      i := i + 1
              branch to 0 if i < n
      

      The number of instructions executed depends on what the value of x[i] is at each iteration: we skip bad := bad + 1 on some iterations. This is a good model for how early timing attacks on, e.g., RSA worked as in Kocher's seminal paper on timing attacks: the main modular exponentiation loop computes a (say) 2048-bit modular squaring unconditionally, but computes a 2048-bit modular multiplication conditionally depending on the value of the secret exponent. Skipping the multiplication substantially changes the time taken by the whole operation.

    • There's another reason, though, and it has to do with branch prediction, a key design element in what makes modern CPUs run so fast on many workloads—even if you write the same amount of code (say, same number of machine instructions, and somehow you guarantee that they take the same number of cycles to compute) in each branch of a conditional, the time it takes to execute may depend on which way the condition went.

    • In general, CPUs are bad at keeping which instructions got executed secret, so don't make the choice of instructions depend on secrets.

  • Table/array lookups might take a different amount of time depending on what memory has been cached in the CPU cache. Consequently, if the location in the array that you're reading from depends on a secret, the time it takes might depend on the secret, which has been exploited to recover AES keys by cache timing.

    (This makes AES a rather questionable design in retrospect, with its intentional use of key-dependent table lookups! NIST's published rationale (§3.6.2, Attacks on Implementations: The Role of Operations) curiously claims table lookups are ‘not vulnerable to timing attacks’ despite the numerous such attacks that have been reported since then.)

  • Variable-distance shifting like x = y << z may take more time on some CPUs if z is larger and less time if it is smaller.

    (This makes RC5 and the AES finalist RC6 rather questionable design in retrospect, with their intentional use of key-dependent rotation distances!)

  • On some CPUs, multiplication may run faster or slower depending on whether the upper half of the inputs is zero or not.

  • 64-bit integer addition on 32-bit CPUs in principle might take more time depending on whether there is a carry. This is because, when x, y, and z, are 64-bit integers, the logic x = y + z might look something more like:

    int carry = 0;
    x[0] = y[0] + z[0];
    if (the previous addition overflowed)
      carry = 1;
    x[1] = y[1] + z[1] + carry;
    

    Consequently, the time it takes may depend on whether there is a carry from the sum of the low 32-bit halves to the sum of the high 32-bit halves. (In practice, this is usually only a concern on exotic CPUs or for other types of side channels like power analysis that are relevant more to smart cards than to laptops and phones.)

This might sound a little overwhelming. What can we do?

There are some operations that generally do run in constant time on most CPUs. They are:

  • Bitwise operations: x & y, x | y, x ^ y, ~x, and other ones that don't appear in C like AND-with-complement.
  • Constant-distance shifts and rotations like the shift x << 3 or the rotationx <<< 3 (not standard C but common in cryptography; it means (x << 3) | (x >> (32 - 3)), if x is 32-bit).
  • Often integer addition and subtraction: x + y, x - y, when x and y are (say) unsigned 32-bit integers on a 32-bit CPU, and often even 64-bit integers on a 32-bit CPU with the help of ADD-with-carry instructions.
  • Sometimes integer multiplication, but the story about multiplication is complicated, which is unfortunate for cryptography because multiplication mixes bits around quite nicely and has useful algebraic properties.

To be clear: I do not mean that a C compiler guarantees these operations run in constant-time if you use them in a C program; I am merely using C notation for the operations that CPUs generally execute in constant time. (More on this caveat in a moment.)

‘But wait,’ you protest, ‘how can I possibly write a useful program out of these operations? No conditionals? No loops? No arrays?’

First, you don't have to eschew conditionals, loops, or arrays altogether. They just can't depend on secrets. For example, for (i = 0; i < 32; i++) ... x[i] ... is fine. But for (i = 0; i < m[0]; i++) ... is not fine if m[0] is supposed to be secret, and for (i = 0; i < m[0]; i++) ... tab[x[i]] ... is not fine if x[i] is supposed to be secret.

Second, you can still build these things! It's just a little trickier. For example, suppose b is a uint32_t that is either 0 or 1. Then b - 1 is either -1 = 0xffffffff or 0, respectively, so

x = ((b - 1) & z) | (~(b - 1) & y);

causes x = y if b is 1, or x = z if b is 0—much like x = (b ? y : z), but without a branch. Obviously, this requires computing both y and z, so there is some performance impact! Similarly, you can look up an element of a table by looking up all elements of the table and selecting the one you want with bitwise operations like this. Not as fast as x[i], but not as leaky either.

In general, you can convert a program with conditionals into a logic circuit with no conditionals, even if you don't want to for performance reasons. There are various other similar tricks you can do. You might draft a constant-time memory equality routine such as crypto_verify_32 like this, assuming x and y are uint8_t arrays:

uint32_t result = 0;
for (i = 0; i < 32; i++)
  result |= x[i] ^ y[i];
return ((result - 1) >> 8) & 1;

Exercise: Does this return 0 for equal and 1 for inequal, or 0 for inequal and 1 for equal?

Writing programs like this—and adopting cryptosystems such as X25519 that encourage implementations that look like this, instead of cryptosystems such as RSA or AES that encourage implementations involving involve secret-dependent branches or secret-dependent table lookups—is a good start for plugging timing side channels.

But, there's a catch! Remember when I said that the C compiler doesn't guarantee constant time? A smart C compiler like Clang/LLVM might recognize that the clever crypto_verify_32 loop above can be executed more efficiently by making it abort early, and might undo the hard work you did to rewrite it as a logic circuit that runs in constant time. (In other circumstances, it might help you out, for example by converting x = (b ? y : z); into a conditional move instruction, CMOV, without branches, but usually you can't rely on the C compiler's good will.)

There are some tricks you can do to thwart this, like an inline assembly fragment that causes the compiler to drop roughly all assumptions for optimization:

uint32_t result = 0;
for (i = 0; i < 32; i++)
  result |= x[i] ^ y[i];
asm volatile ("" ::: "memory");
return ((result - 1) >> 8) & 1;

This may or may not work with your compiler. To be confident, you really have to examine the compiler's generated machine code—and even then, a compiler might perform just-in-time optimizations that rewrite the machine code according to profiling analysis, especially in higher-level languages like Java. So you might really want to write the logic in assembly (or in a programming language like qhasm that can generate the fine-tuned assembly more reliably than a C compiler), and just call it from C.

Maybe some day C compilers will adopt a secret qualifier, like const or volatile, which forces the compiler to generate only machine instructions that are known—in some model of the CPU!—to run in constant time when operating on the object, and prevents the compiler from taking shortcuts like secret-dependent early aborts from a loop. But that day is not here yet.

There's also a matter of which machine instructions actually run in constant time on a CPU, which is sometimes documented and sometimes reliable. So in addition to doing engineering to build your programs out of logic circuits, you also need to do science to figure out which operations are actually safe to use on the CPU.

This brings us back to the original point: You really want to focus the effort of maintaining this into a library routine, so that each programmer doesn't have to keep track of vagaries of compilers (and CPU designs!) in generated code and timing on their own, and can instead leave it to our friendly neighborhood bear.


Are there other countermeasures than constant-time logic? Sometimes, yes.

  • You could inject random noise into your logic, in the hope that it will confuse the attacker's measurements. But there's already noise in their measurements, like scheduling in the operating system, so they just have to take more samples—and it turns out noise is not a very effective side channel countermeasure.

    Specifically, artificial noise raises the attacker's costs by at most about the square of the ratio of artificial noise to true noise, which is far below what is usually considered an acceptable gap for security in cryptography. So it mostly costs you a lot of time doing nothing.

  • You could use algebraic properties of the cryptosystem to randomize it, sometimes called ‘blinding’. For example, instead of computing y^d mod n where d is a secret exponent in RSA, you could pick r at random, compute s := r^e mod n where e*d ≡ 1 (mod (n)), multiply y by s to get (y * r^e) mod n, compute (y * r^e)^d mod n = (r * y^d) mod n, and then divide off r.

    Many implementations, such as OpenSSL, use this approach because it's an easy way to retrofit an existing implementation of a cryptosystem like RSA that has the necessary algebraic structure. It's not a bad idea like random noise is, but it does have costs: you have to do the extra work for randomization, you have to have modular division or inversion logic—and side channels might still leak information about r and d. For example, even blinded modular exponentiation will leak the Hamming weight of d unless you take additional countermeasures like adding a random multiple of (n) to d first—which may expose additional side channels, etc.

  • For the specific case of comparing two byte strings for equality (for example, two message authentication codes), one reasonable option is to hash them with a pseudorandom function family like HMAC-SHA256 under a one-time secret key k, and check whether HMAC-SHA256_k(x) == HMAC-SHA256_k(y).

    The probability of a false positive is 1/2256, which is a smaller probability than you ever have to worry about. You can safely use variable-time equality for the HMAC because if x is not equal to y, then the amount of time even in the naivest byte string equality routine (assuming it doesn't bail out at the first zero byte or something stupid like that!) will be independent of the values of x and y: there's a 255/256 probability it'll stop after one iteration, a 65535/65536 probability after two iterations, etc.

    Of course, this really helps only if you can implement HMAC-SHA256 in constant time! Fortunately SHA-256 is designed to be easily implemented as a constant-time logic circuit, so C implementations tend to be reasonably resistant to side channels—but, say, Python will get you into trouble because of the small integer cache if nothing else.


* The terminology is unfortunately a little confusing. Here constant-time means that the amount of time does not vary depending on inputs, and is not the same as the asymptotic notion of ‘constant-time’ in computer science, often written O(1), which just means the amount of time may vary depending on inputs but is bounded by a constant. I'm sorry. I didn't invent the terminology. I might have chosen ‘fixed-time’ vs. ‘variable-time’ but it's too late now—‘constant-time’ is entrenched in the literature.

Score:7
id flag

Here's my two cents:

A timing-attack uses the time that it takes to execute an algorithm based on different inputs.

Take a simpler problem, such as finding if a single character exists in a secret string. In a traditional algorithm, you would iterate over the string, and return a boolean once you find the character. This would take more time the further a character is, thus leaking some information of the secret string.

If we wanted to make this constant time, we would make finding the character take the same time whether it is at the start or the end, e.g. iterating the whole string and returning once you end the iteration.

Another thing that takes more time are branches. A branch is a piece of code that might get executed or not i.e. an if statement.

Example taken from a GF(2^8) multiplication function:

with branch:

if ((a >> 7) & 1)
    a = (a << 1) ^ polynomial;
else
    a <<= 1

branchless:

const uint8_t mask = -((a >> 7) & 1);  // 0xff if highest bit is set, else 0x00
a = (a << 1) ^ (polynomial & mask);    // only applies polynomial if mask is 0xff

As you can see, the code with the branch can do 1 operation more in one of the cases, while the branchless code always executes the same operations.

Score:6
de flag

What you've heard is correct.

It's an uphill struggle. The compiler and hardware are not designed to help you, and they are deliberately designed to do things that confound what you're trying to do.

  • The C standard says nothing about timing. Compilers do not try to ensure that, if your C code looks like a constant-time algorithm, it always compiles to constant-time machine code.

    One approach is to learn in great detail what the compiler can do, and how to avoid confounding optimizations. Marking functions as noinline, for example, might be important.

    Another approach is to eliminate the compiler from the picture altogether. You can compile your timing-sensitive C code to assembly once, examine the assembly to make sure it's still a constant-time algorithm, check in the assembly code, and use that, not the original C, as source when building your program.

  • At the machine code level there is a similar problem. CPUs do not try to ensure that a constant sequence of instructions always runs in constant time. Many mechanisms in the hardware (caches, in particular) try to speed things up, but don't always succeed. Code can run slower in very specific special cases where the speed-up didn't work out. This can leak information.

    I don't know how to address this except by knowing the hardware extremely well and measuring.

user7761803 avatar
vn flag
"CPUs do not try to ensure that a constant sequence of instructions always runs in constant time" no, but some CPU architectures allow for data-independent timing of individual instructions, e.g. in Armv8.4A - https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/DIT--Data-Independent-Timing
Score:4
gd flag

My humble 2 cents: bitslicing techniques if affordable in your case: https://timtaubert.de/blog/2018/08/bitslicing-an-introduction/

EDIT

I don't want to duplicate (worsening them) words you can read in the linked page, but I have been asked for something more than a link, so let's say that essentially bitslicing simulates a hardware implementation in software: the entire algorithm is represented as a sequence of atomic Boolean operations, which are then executed in constant time. I'm sorry I cannot provide further details, only recently I have known about this technique during a seminar on applied cryptography best practices: I haven't gone deeper, but it seems what you need, given you want to make time-resistant just a very specific part of your code, so the implementation overhead is maybe affordable in your case (of course it would a painful nightmare for wide-scope code)

Score:3
kz flag

Here is the good news: Your algorithm doesn't need to be "constant time". It just needs to be independent on the input data. It is fine if there are random variations; they are actually helpful if the random variations are bigger than the data-dependent variations.

Memory access is notoriously running at variable time. So if you have data-dependent access patterns, that's bad. An example was given, table [x[i]]. Accessing x[i] in a loop is not constant time, but the time doesn't depend on the data. Accessing table [x[i]] is data dependent, so it can be possible to get information about the data.

Multiply and divide operations may be data dependent, so that is something you need to test.

poncho avatar
my flag
"It just needs to be independent on the input data"; actually, it needs to be independent of secret data, that is, anything that we need to assume the adversary has no access to. It is not a problem if, say, the time your public key signature operation takes is dependent on the message, as we assume the message is public anyways.
Score:2
nc flag

If you can accept "probably constant-time" as an answer (rather than "definitely constant-time"), then you could evaluate experimentally if your code is constant-time. Do to so, I recommend the tool dudect. To quote the GitHub Readme:

How does this work? In a nutshell, this code takes two different inputs, runs the piece of code many times for each input and measures how much time it takes. If there is a statistical difference on the (average) time it takes to run with different inputs, the implementation is deemed not time constant. For details, read our paper or have a look at the source.

And:

My code passes this tests. Does it mean it is really time constant? Absolutely not. [...]

A similar tool is ct-fuzz, which uses coverage-based greybox fuzzy testing to detect timing leaks. More on the methodology in the paper ct-fuzz: Fuzzing for Timing Leaks.

With both dudect and ct-fuzz, you can't be 100% sure that your C code is constant-time, but if there are major timing leakage, you'll find them. And if the tool cannot find any timing leakage, then you can be "reasonably confident" that there is no leakage.


A few other tools can be used to determine statically if a piece of C code is constant-time:

  • ctgrind, built on top of valgrind, tracks every bit of secret memory and checks that conditionals do not depend on secret bits.

  • CacheAudit, which, according to its GitHub Readme:

CacheAudit is a static analyzer of cache side-channels. CacheAudit takes as input a 32-bit x86 program binary and a cache configuration, and it derives formal, quantitative security guarantees for a comprehensive set of cache adversaries, namely those based on observing cache states, traces of hits and misses, and execution times.

Because those tools use static analysis, you can be more confident that there is no timing leakage if the tool couldn't find any. However, none of those tools have a precise hardware model of what leaks and what doesn't. This mean that they assume that, for instance, multiplication is constant-time. If multiplication is actually not constant-time, then those tools will not detect potential timing leaks. See Peter Green's answer for more on what operations could leak, but are probably assumed constant-time by CacheAudit and ctgrind.

Score:1
br flag

You can do precise timing measurements using the intrinsic function __rdtscp(), then do statistics on results

This function exposes an internal 64 bits register incremented at each clock cycle, the clock frequency being the official processor frequency as seen in /proc/cpuinfo after the @ sign

10th generation processors should have flag constant_tsc as seen in /proc/cpuinfo

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.