Forget about cryptography and anything to do with secrets. Let's talk about sorting, a classic problem where we compare the efficiency between algorithms.
A problem is a specification of how each input maps to an output. For sorting, given a finite list of integers, return the sorted permutation of it. A problem doesn't specify how to compute the answer step by step. We generally don't associate an asymptotic complexity with a problem, except perhaps a lower bound.
An algorithm is a mechanical, unambiguous procedure to solve a problem. A problem can have multiple algorithms as alternatives. For example, sorting can be solved by bubble sort, heap sort, quick sort, etc.
Given an algorithm and an input, we can quantify how much of some resource it uses to compute the output. Most of the time, we care about the running time or the number of steps taken. Sometimes we care about how much extra memory the algorithm uses. Rarely, we care about things like network round trips, disk seeks, etc.
But simply assigning a quantity (or "cost") to each input isn't very interesting. We want to characterize broad patterns about how the algorithm behaves on many, many inputs.
One way to characterize inputs is to assign a "size" to each one. For sorting, the natural metric is the length of the list. The classical problem of sorting only uses comparisons between elements and doesn't look into the deeper structure of values (e.g. bits of an integer). Because of this, for a given list length $n$, there are at most $3^n$ possible lists that are distinguishable by comparisons. At length 0 there is at most 1 list, at length 1 there are at most 3 lists, at length 2 there are at most 9 lists, at length 3 there are at most 27 lists, etc.
Because each input size $n$ has many possible values, we usually can't just say an algorithm like quick sort takes $k$ steps on any value of size $n$. Instead, we have to summarize these values in a thoughtful way. If we choose to look at worst-case time complexity, then we look at all lists of length 0 and report the running time of the worst one, then we look at all lists of length 1 and report the running time of the worst one, then we look at all lists of length 2 and report the running time of the worst one, etc. This is a well-defined function that maps each input size $n$ to a single number $k$. The average case requires a probability distribution, which is usually taken to be uniform.
Why might we care about the best-case running time? We might do a computation on one CPU core and issue some other computation on another core that is guaranteed to finish in less than the first algorithm's best-case time. Why we care about worst-case time? We might set a watchdog timer to reset the CPU if it takes longer than the analyzed worst-case time to process an input of a given size.
The running time of an algorithm is generally not given in absolute units like nanoseconds or steps. We could say that quick sort uses $5n^2 - 8n + 37$ instructions for a list of length $n$ in the worst case, but it's not helpful. This is because it is intimately sensitive to details like CPU speed, amount of work done per instruction (e.g. multiply then add vs. fused multiply-add), and various simple compiler transformations. So we summarize this using the big-O family of notations which deletes constant multipliers (e.g. $O(5n^3) = O(n^3)$) and deletes lower-order terms (e.g. $O(n^2 + 2n + 4) = O(n^2)$).
From the problem of sorting, we can see some interesting runtime behaviors of different algorithms.
The time complexity of bubble sort is $O(n^2)$ at best, $O(n^2)$ on average, and $O(n^2)$ at worst.
The time complexity of insertion sort is $O(n)$ at best, $O(n^2)$ on average, and $O(n^2)$ at worst.
The time complexity of heap sort is $O(n \log n)$ at best, $O(n \log n)$ on average, and $O(n \log n)$ at worst.
The time complexity of quick sort is $O(n \log n)$ at best, $O(n \log n)$ on average, and $O(n^2)$ at worst.
We can see that sometimes the average case matches the best case and sometimes it matches the worst case. I think there are some NP-complete problems where the best currently known algorithm has a polynomial average-case time complexity and necessarily exponential worst-case time complexity (because we haven't proven that P=NP).
Circling back to cryptography, we don't want to use problems where the best known algorithm has an average-case time complexity is significantly less than its worst-case time complexity.
There are no polynomial reductions between average-case and worst-case time complexities. This part of the question sounds like a word babble. Please clarify by quoting text from a source instead of paraphrasing it in your own words.