Score:2

Time Complexity of Exhaustive Search Algorithm

in flag

I have the sets $S_1=\{2,10,20,6\}$ and $S_2=\{25,26,20\}$ and I want to find which numbers sum to make 32. This is very easy by inspection; 6 and 26. It seems similar to the Knapsack problem, but I am no expert.

However, say I have 1000 sets, each with 500 elements such that summing one term from each set always gives you a unique value. This is much harder to inspect and solve, especially if the sets follow a structure that will appear random (they would be constructed from structured sets that have been messed with, and it would be near impossible to guess the mixing and reverse the sets).

So, the only way must be an Exhaustive Search Algorithm. Given my number is 52,485,332, there are $1000^{500}$ possible options to look at. Indeed there will be ways to shorten this search (such as when a set has numbers larger than your target value, you can ignore those numbers). But otherwise you might still be looking at $750^{500}$ possible choices.

So, what is the time complexity of such a search algorithm? $O(n^{k})$, with $n$ the number of sets and $k$ the number of elements in those sets? They have to check "all" possible combinations of terms until one matches the given value.

The main questions seem to be "how many sets are there?" and "how large are the sets?". Indeed, the sets can also all be of various sizes, not just uniform.

I am not a cryptography person; my research just flirts with the idea at hand. Any help would be appreciated.

poncho avatar
my flag
"So, the only way must be an Exhaustive Search Algorithm" - ummm, no, even without any structure in the values, there are significantly more efficient search methods; one that comes immediately to mind takes $O(nm)$ time (where $m$ is the target sum; with $m=52485332$, that looks to be practically solvable). Are you interested, or is your query specifcally about the time taken by the naïve search algorithm?
MeBadMaths avatar
in flag
@poncho thanks for the comment. I wasn't aware there were better methods than a "naïve" search method. I would be interested in the one you suggested - anything you can suggest would be great.
jjj avatar
cn flag
jjj
There are 500^1000 options, not 1000^500
Score:2
my flag

So, the only way must be an Exhaustive Search Algorithm

As I mentioned in my comment, there are practical methods for nonhuge values of $m$ (and $m=52485332$ is not huge). Here is the outline of one such method (which assumes all the sets consist of nonnegative integers):

  • We have an array $A_{n, m+1}$; each element of the array $A_{a, b}$ will note how we can generate the sum $b$ from the first $a$ sets (or $\perp$ if no such way has been found yet).

  • Initialize all elements $A$ to $\perp$

  • For $i := 1$ to $n$ search the elements $A_{i-1}$ for non-$\perp$ element (and for $i=1$, the 0th element is treated as the only non-$\perp$ element. For each such element $A_{i-1, x}$, set $A_{i, x + S_{i,j}}$ to $j$ (for each element $S_{i,j}$ of the set $S_i$) If $x + S_{i,j} > m$, ignore it.

  • Finally, if $A_{n,m} = \perp$, there is no subset that leads to the sum $m$. If it is anything else, we can recover the terms by backtracking through the array $A$.

This should be a practical algorithm for recovering the terms given the parameters you have specified.

jjj avatar
cn flag
jjj
In the question it was stated that summing always gives a unique value. This means there are 1000^500 different results, so we musst assume we're talking about m in this magnitude as well. This algorithm will "never" finish then
poncho avatar
my flag
@jjj: the question also states "Given my number is 52,485,332"; I had assumed they were listing their $m$ value; if not, what is "my number"?
jjj avatar
cn flag
jjj
I guess the author of the question is not aware of the fact that the uniqueness will result in most numbers of the set alone will be much larger that theis sum. And removing them would drastically reduce the problem
MeBadMaths avatar
in flag
Thank you both for your answers and comments. I am not versed in the details of Cryptography so sorry for the confusion my post has caused. The fact I thought 52,485,332 was "huge" shows my ignorance haha. To clarify; all the sets are non-negative integers. Taking a value from each set and summing them will always guarantee a unique element.
MeBadMaths avatar
in flag
By "my number" I meant the term I encoded (something I didn't state - sorry!). Say I have got a message and encoded it through these sets. The resulting number is 52,485,332 (or maybe even bigger!). The Search Algorithm's job is to find which unique values across the 1000 sets sum to make this number. From there you can decode the message. The 1000 sets act as a public key. So, if it's easy to crack that's an issue. Hope that helps. @jjj could you explain your last comment more? I don't think I follow, sorry
MeBadMaths avatar
in flag
@poncho Thanks for the algorithm. I don't think I understand what the upside down T stands for - my notation knowledge of Cryptography is basic at best. I'll have a closer look and see if I can understand the rest of it
poncho avatar
my flag
@MeBadMaths: $\perp$ means "empty", that is, a symbol that is distinguishable from any valid value. In C terms, think of it as a NULL pointer...
MeBadMaths avatar
in flag
@poncho: thank you for the clarification. I feel like that algorithm is similar to a Naïve Search Algorithm in the sense that you are iterating through all the sets and compiling a list of so far possible values that could be summed to. If any sum ever gets too big, you stop looking at it. Eventually you will stop focusing on all but the values which will be equal to or smaller than the designated one. I suppose one advantage of your algorithm is you're not checking the values until the very end - except to compare if it's too large?
MeBadMaths avatar
in flag
A question I have is; how many sets (of various sizes) would you need for this algorithm to just take too long? As @jjj suggested, though you can stop looking at a fair chunk of the sets as eventually a lot of the values of numbers will be too big. If, for example, your encoded numbers was in the very middle of all possible summed numbers, the first step might only look at 250 numbers, but each of those then add 250 from the second set, and then 250 on each of those for the third set. This seems like it's going to get HUGE real soon? Though, again, I might be underestimating "huge" haha
poncho avatar
my flag
@MeBadMaths: actually, the real advantage of my algorithm (compared to brute force search) is if there are multiple ways to reach an intermediate sum. Suppose there were 1000 different ways to reach the value 314159 after 42 steps; brute force will repeat the computation starting from (314159,42) 1000 times; my algorithm will do it only once.
poncho avatar
my flag
@MeBadMaths: as for ways to make the problem infeasible for my algorithm, the most obvious one is to make m huge. For unbounded m, this problem actually is NP-hard, however to get into the really difficult range, m must be enormous...
MeBadMaths avatar
in flag
@poncho I believe that there would only ever be 1 way after the first 42 steps to reach 314159. If there was more ways, then there wouldn't be a unique combination for each sum. But I can see how your algorithm does what you state- it's a really great and well thought out algorithm! Far better than anything I could do haha. Thanks for the clarification on $m$ too. I truly underestimated what "huge" meant.
Score:1
cn flag
jjj

This is more an answer to why the uniqueness of the sums effects the size so much that the case for $52485332$ becomes trivial (its way to long for a comment).

When all sums must be unique, then they must result in different integers. Because there are $500^{1000}$ possible sums, there are also $500^{1000}$ different integer results for that. the lowest case would be all integers from $0$ to $500^{1000}-1$.

For Example,

$S_1 = \{0, 1, 2, ..., 499\}$

$S_2 = \{0, 500, 1000, ..., 249500\}$

$S_3 = \{0, 250000, 500000, ..., 124750000\}$

...

$S_{1000} = \{0, 500^{999}, 2*500^{999}, ..., 499*500^{999}\}$

would be a way to ensure the uniqueness of the result. As you can see, The Numbers get really large really fast.

In this particular example its easy to find the result (just always choose the largest number that fits from last to first set). Even most numbers is $S_3$ are greater than $52485332$ and can therefore be ignored.

You would probably want relatively random values in your sets. In this case the range of the values have to be at least slightly larger.

However, it is highly unlikely that any value is lower or equal to $52485332$ (when you uniformly choose $500000$ values out of $500^{1000}$)

Dynamic programming, as @poncho suggested, really only works for small numbers and its performance is not that much better than exhaustive search (linear difference in the number of sets), because the sub-sums, that can be reused are unique, the advantage of not looking at other possibilities is not there. Runtime should be same order as exhaustive search. Only improvement is to aboard when values are to large or small to reach the target, but for reasonable targets that is not much of an advantage.

On could easily reduce the subset sum problem or knapsack problem to this one by just using the same set as many times as the number you want to sum to. Problem with this is that this is not a polynomial time reduction ant therefore not sufficient to proof if problem is NP hard.

MeBadMaths avatar
in flag
Thank you for the response. The example of the sets you gave is one I know well and is the most basic of these sets. But, from there you can "mess" this set up in a way that is entirely unrecoverable without extra information (the private key). The resulting "messed up" sets will be seemingly random and will generate a vveerryy large range of sets. I agree that the given 52485332 is wwaayy too small - didn't really think that through haha. But, say my value was of a sufficient size that tracked properly with the "random" sets - I assume that makes this problem harder to solve?
jjj avatar
cn flag
jjj
@MeBadMaths The number of ways to mess this up are relatively limited (if possible at all). For example when you split {0, 1, 2, 3, 0, 4, 8, 12} (smaller example same structure) into two sets of size 4, then there is only one way to do this (ignoring order of sets and order inside the sets)
MeBadMaths avatar
in flag
It is possible to "mess" these sets up, but it gets more and more huge as you get larger and larger values. Let $N=\prod_{j=1}^n |S_{j}|$, then take $d>N$. Take $r$ such that $r$ and $d$ are co-prime, and take $0\leq t_1,t_2,\dots,t_n<N$. Now define the set; $T_j=r(S_j+t_j)$(mod $d$). As long as these conditions are met, summing a value from each $T_j$ will result in a unique value. You can further "mess" these sets up in a similar way. The private key would contain $d,r,t_j$ and the $S_j$ sets, while the public key would have only $T_j$ - I swear the sets would be suitably random looking
jjj avatar
cn flag
jjj
@MeBadMaths Ok, yes your right. I thought you just wanted to swap the elements, not modify them. I verified that your formula holds and is indeed not distinguishable from random values. I'll adjust the last part of my answer
MeBadMaths avatar
in flag
thank you for all your comments and answers - and sorry for my poor explanation. I've not used these forums before to such an extent so I am still new to how to format questions etc. I really appreciate your support. Definitely not looking to prove NP haha. Your input has helped nevertheless. Last question would be; is the order of extensive search $O(n^k)$ as I assumed in my post, or $O(k^n)$? Or is it something else? Thank you again
jjj avatar
cn flag
jjj
I wanted to say that its not enough to proof that this problem is NP hard^^. Exhaustive search would be O(k^n) (every of the k elements in set one can be combined with every of the k elements in set 2...)
MeBadMaths avatar
in flag
Yes that makes sense, more so than $n^k$. I've been reading the wiki page on Time Complexity, and I can't tell if $O(k^n)$ is Polynomial Time or Exponential time? (or neither!) I swear this is the last question ahaha, thanks for all your help @jjj
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.