Score:4

# Average- and worst-case complexity

The terms "average-case", "worst-case" hardness are quite confusing.

1. What do they mean when they say certain problems (like lattices) have an average-case to worst-case relationship? Do they mean there is a polynomial reduction between two case of the problem? Because polynomial reductions are of our interest as they are practical.
2. If we call an instance of a certain problem has average-case hardness, and then it's equal to worst-case hardness, isn't it somehow ironical? If worst-case is equivalent to average-case, so why do we call it worst-case?!
3. So if we accept that for some problems there is a relationship between their average-case and worst-case, why don't we consider best-case or simplest-case? Why there is no attention to simple-case of a problem and its dangers? Because it's obvious that we don't work with simple-case of a problem in cryptography?

I'm not familiar with complexity theory very much. I appreciate it if anyone explains these questions with their basic ideas behind them.

***Update: This question generally comes from lattice-based cryptography papers and specifically papers like:
["New Lattice Based Cryptographic Constructions",Oded Regev'STOC2003]
["On Lattices, Learning with Errors, Random Linear Codes, and Cryptography",Oded Regev'STOC2005]
["Worst-case to Average-case Reductions based on Gaussian Measures",Daniele Micciancio,Oded Regev'FOCS2004].

***Update: About the 2nd question, please look at the comment of this question too.
((About the 2nd question, equivalence of average- and worst-case hardness of a problem, statements like: "If you solve an instance of the (lattice) problem in average-case, then you are able to solve any instances of the problem that includes worst-case hardness too. This equivalence means so many instances of the problem are worst-case. So we don't be worried that the adversary faces small search space as we select worst-case of a problem to make our scheme more secure. In other words, worst-case instances space of the problem stretches to its average-case. Is that correct??))

About the **2nd question**, equivalence of average- and worst-case hardness of a problem, statements like: "If you solve an instance of the (lattice) problem in average-case, then you are able to solve _any_ instances of the problem that includes worst-case hardness too. This equivalence means _so many_ instances of the problem are worst-case. So we don't be worried that the adversary faces small search space as we select worst-case of a problem to make our scheme more secure. In other words, _worst-case instances space of the problem stretches to its average-average-case_. Is that correct??
Score:3

1. What do they mean when they say certain problems (like lattices) have an average-case to worst-case relationship? Do they mean there is a polynomial reduction between two case of the problem? Because polynomial reductions are of our interest as they are practical.

Sort of, but the answer to this problem is really "for lattices people mean something else different". Also, for essentially all problems in cryptography, when you read "reduction" you should interpret it as "efficient" (typically "polynomial-time") reduction. If authors mean something else it is on them to clarify this.

Many cryptographic problems $$P$$ have the following property

Any algorithm breaking $$P$$ on average can be used to with good probability break any instance of $$P$$.

The high-level idea of this statement is that, given an arbitrary instance of $$P$$, we can map it to a uniformly random instance (which I will call $$P_\\\$$) such that solving $$P_\\\$$ solves $$P$$. The formal term for this property is "randomized self-reduction". Lattices have this (though it is not what people mean by worst-case to average-case reductions when talking about lattices), but so do things like the discrete log problem. Note that factoring does not have this property, i.e. not all cryptographic problems have randomized self-reductions.

The property lattices have is instead the following.

Any algorithm breaking $$P$$ on average can be used to with good probability break any instance of $$Q$$ (a different problem).

Depending on what different problem $$Q$$ you use, this is boring/uninteresting. With lattices, $$Q$$ is a problem which is well-studied (roughly approximate forms of the closest vector problem) and independently interesting, which is the entire reason the reductions are interesting.

1. If we call an instance of a certain problem has average-case hardness, and then it's equal to worst-case hardness, isn't it somehow ironical? If worst-case is equivalent to average-case, so why do we call it worst-case?!

Because they are not the same for all problems. So problems that they are the same are somewhat nicer. As a basic example, the (decision form of) the factoring problem is something along the lines of

On input $$N$$, a natural number, and $$B$$, a size bound, does $$N$$ have a factor of size less than $$B$$?

We include the size bound as it allows one to convert a solution to this decision problem to a factoring algorithm (I believe, haven't checked details lately). Anyway, in terms of this problem, factoring has some plausible worst-case complexity (RSA seems hard!), but writing down the right average-case hard variant of it is not straightforward. The obvious thing to do (sample $$N$$ uniformly randomly in some range) fails miserably --- it will be even with probability 1/2, and more generally will contain many small factors, and therefore be practically solvable using ECM. While given enough time we have pinned down a plausibly average-case hard distribution (namely, what is used in RSA), this was via trial-and-error, and we have no theoretical guarantees this distribution over inputs leads to hard problems, even if factoring itself is hard!

This is different than other problems (discrete log, lattices, etc), for which there is an equivalence between worst-case and average-case hardness.

1. So if we accept that for some problems there is a relationship between their average-case and worst-case, why don't we consider best-case or simplest-case? Why there is no attention to simple-case of a problem and its dangers? Because it's obvious that we don't work with simple-case of a problem in cryptography?

We do somewhat. Arguably "best-case complexity" is what is studied in things like

• weak key attacks
• various attacks on leakage

and other things of this form. In this setting we are assuming the keys come from sets for which we have much better algorithms, and then apply those algorithms. Again, the best-case complexity of problems can vary wildly. For example, RSA becomes easy if a certain proportion of the bits of the secret key leak. By contrast, LWE is fairly resiliant to bits of the secret key leaking (the problem gets easier of course, but gradually, vs "all at once" at some threshold like RSA). It seems fair to classify this sort of research as perhaps not the "best-case complexity" of the problems, but "better than average-case complexity" of the problems.

Your answer to the **3rd question**, considering best- or simplest-case of a problem in cryptography, is different from the thing that I have in my mind, quite informative and useful, and it convinces me. And as you've written well, we should pay attention to reduction between two case of a _single_ problem and between two _distinct_ problems. About the **2nd question**, equivalence between average- and worst-case hardness, mentioning random self-reducibility (e.g. RSA, discrete logarithm, LWE, ...) may be helpful. I want to write in comment of the questions what I heard that may be correct.
Score:2

Forget about cryptography and anything to do with secrets. Let's talk about sorting, a classic problem where we compare the efficiency between algorithms.

A problem is a specification of how each input maps to an output. For sorting, given a finite list of integers, return the sorted permutation of it. A problem doesn't specify how to compute the answer step by step. We generally don't associate an asymptotic complexity with a problem, except perhaps a lower bound.

An algorithm is a mechanical, unambiguous procedure to solve a problem. A problem can have multiple algorithms as alternatives. For example, sorting can be solved by bubble sort, heap sort, quick sort, etc.

Given an algorithm and an input, we can quantify how much of some resource it uses to compute the output. Most of the time, we care about the running time or the number of steps taken. Sometimes we care about how much extra memory the algorithm uses. Rarely, we care about things like network round trips, disk seeks, etc.

But simply assigning a quantity (or "cost") to each input isn't very interesting. We want to characterize broad patterns about how the algorithm behaves on many, many inputs.

One way to characterize inputs is to assign a "size" to each one. For sorting, the natural metric is the length of the list. The classical problem of sorting only uses comparisons between elements and doesn't look into the deeper structure of values (e.g. bits of an integer). Because of this, for a given list length $$n$$, there are at most $$3^n$$ possible lists that are distinguishable by comparisons. At length 0 there is at most 1 list, at length 1 there are at most 3 lists, at length 2 there are at most 9 lists, at length 3 there are at most 27 lists, etc.

Because each input size $$n$$ has many possible values, we usually can't just say an algorithm like quick sort takes $$k$$ steps on any value of size $$n$$. Instead, we have to summarize these values in a thoughtful way. If we choose to look at worst-case time complexity, then we look at all lists of length 0 and report the running time of the worst one, then we look at all lists of length 1 and report the running time of the worst one, then we look at all lists of length 2 and report the running time of the worst one, etc. This is a well-defined function that maps each input size $$n$$ to a single number $$k$$. The average case requires a probability distribution, which is usually taken to be uniform.

Why might we care about the best-case running time? We might do a computation on one CPU core and issue some other computation on another core that is guaranteed to finish in less than the first algorithm's best-case time. Why we care about worst-case time? We might set a watchdog timer to reset the CPU if it takes longer than the analyzed worst-case time to process an input of a given size.

The running time of an algorithm is generally not given in absolute units like nanoseconds or steps. We could say that quick sort uses $$5n^2 - 8n + 37$$ instructions for a list of length $$n$$ in the worst case, but it's not helpful. This is because it is intimately sensitive to details like CPU speed, amount of work done per instruction (e.g. multiply then add vs. fused multiply-add), and various simple compiler transformations. So we summarize this using the big-O family of notations which deletes constant multipliers (e.g. $$O(5n^3) = O(n^3)$$) and deletes lower-order terms (e.g. $$O(n^2 + 2n + 4) = O(n^2)$$).

From the problem of sorting, we can see some interesting runtime behaviors of different algorithms.

• The time complexity of bubble sort is $$O(n^2)$$ at best, $$O(n^2)$$ on average, and $$O(n^2)$$ at worst.

• The time complexity of insertion sort is $$O(n)$$ at best, $$O(n^2)$$ on average, and $$O(n^2)$$ at worst.

• The time complexity of heap sort is $$O(n \log n)$$ at best, $$O(n \log n)$$ on average, and $$O(n \log n)$$ at worst.

• The time complexity of quick sort is $$O(n \log n)$$ at best, $$O(n \log n)$$ on average, and $$O(n^2)$$ at worst.

We can see that sometimes the average case matches the best case and sometimes it matches the worst case. I think there are some NP-complete problems where the best currently known algorithm has a polynomial average-case time complexity and necessarily exponential worst-case time complexity (because we haven't proven that P=NP).

Circling back to cryptography, we don't want to use problems where the best known algorithm has an average-case time complexity is significantly less than its worst-case time complexity.

There are no polynomial reductions between average-case and worst-case time complexities. This part of the question sounds like a word babble. Please clarify by quoting text from a source instead of paraphrasing it in your own words.

Those examples that have the same complexity in average- and worst-case in big O notation, were very good examples. I've updated the question with references to some papers.
@user1035648: These examples are from Wikipedia, from the link in the comment of "EugeneStyer.". Haven't you read it?
@Nayuki I saw them. With your explanation, those examples make more sense to me. Thank you. But unfortunately, I should study more to find out the answers.
Score:2

The most questions are answered by the link in the comment of @EugeneStyer. But one question is not addressed well there.

why don't we consider best-case or simplest-case? Why there is no attention to simple-case of a problem and its dangers?

It has to do with probability.

What is the simplest-case for the attacker to guess a 256-bit encryption key? It is the case that the first attempt will be successful. But we consider the probability as very low and expect that this will not happen in reality.

The simplest-case has no practical value. It does not show us, if our scheme is sufficient for our goal, or if it should be improved, of it is stronger than needed for our goal and we can weaken it to some extent (e.g. to improve usability).

Where as the average complexity has practical value. It helps us to estimate, how much will the attack on average cost and how feasible it for the attacker is. Based on this we can decide, if we should do anything.

Score:1

For decryption, there is another important measure: What percentage of messages can be decrypted in some fixed time?

We don’t actually care about the cost of decryption, but whether we can do it with acceptable cost. If I tell you “worst case cost of decrypting is 2.7 trillion dollars, average case is 9.3 billion” that doesn’t tell you whether you can decode a particular message.

But if I tell you that 2.35% of messages can be decrypted at a cost of one million each then you might try an attack. And if I told you that for \$10,000 you can verify that a message can be decrypted for a million, that makes it even better for the attacker.

(I think there are decryption schemes that collect little bits of information and with enough little bits crack the encryption. Assume you need 1 million little bits of information, the effort to find one bit of info depends on the key, you spend \$10,000. If you found 10,000 information items then you guess that you can break this key for a million. If you found only 10 items then you guess the decryption cost for this key is a billion. So you can find all the keys that are easy to crack).

I sit in a Tesla and translated this thread with Ai: