Technical feasibility of a theoritical compression mechanism

Giorgos Xou

6/28/24, 2:32 PM

Considering that quantum computers (in theory) can break\reverse some of one-way-hashes a lot faster than conventional-computers. Is it technically feasible in any theoritical (but realistic) senarion, to store data "compressed" as in the form of an one-way-hash and reverse\restore\"decompress" it with the help of a quantum system on demand? And if so, could that make it an ideal way of storing data in the future? and by imagining such a theoritical case, how could something like this be possible in any precisely described way?

PS. "Is it technically feasible" - knowing that there are many issues with one-way-functions such as collision-resistance and that there are a lot of limmitation as regards to this day's state-of-the-art quantum computing systems [...]

Thanks in advance for any cool, in depth senario and have a beautiful day.

2 + 0

one-way-function

quantum-cryptanalysis

Score:0

Crypto

fgrieu

6/28/24, 3:56 PM

Is it technically feasible in any theoretical (but realistic) scenario, to store data "compressed" as in the form of an one-way-hash and reverse\restore"decompress" it with the help of a quantum system on demand?

No. Several messages correspond to a given hash, and even an infinitely powerful computer has no way to tell which is the right one for the decompression procedure.

Further, even ideal quantum computers aren't expected to be infinitely powerful and break preimage resistance of an arbitrarily good hash. A $2n$-bit hash is expected to be (way) more resistant to a quantum computer that a similar $n$-bit hash is to a classical computer.

+ 0

Score:0

Crypto

Santhanam

6/28/24, 4:46 PM

It is not even theoretically feasible.

$hash()$ functions take any arbitrary data (for our argument, let us imagine all data is 1024 bits), and output a 128 bit hash.

This means that mathematically, each 128-bit output $y$ is guaranteed to have multiple 1024-bit $x_1, ..., x_n$ where each $x_i$ hash to $y$. This is a fact that is independent of any sort of computation.

However even the best reversible computer, given $y$, can only return to you any of the $x_i$s.

This becomes significant in password-based authentication scenarios. Let us say a user logs in with password $x_1$, and you store $hash(x_1) = y$ in your database. The database leaks, an attacker uses $y$, the knowledge of what hash function you used, and a reversible computer and gets, say $x_3$. They can still get access to your account, as the server only compares the hashes, and $hash(x_1) == y == hash(x_3)$.

However, in compression, it wouldn't be useful if you compressed $x_1$ and on decompression, you got $x_3$ back.

Regarding your reddit comment, It is also not possible to encode which $x$ it would decompress into, as then you would need knowledge of what exactly $x$ is, which would mean your compression program only works for one value, and is thus not a useful one.

Thought process

It is useful to have a basic idea of information theory.

Compression is a way to exploit the fact that the 1024-bit data in question, may not actually hold 1024 bits of information. For example, consider the compression scheme "run-length encoding".

The string 1... (1024 times) will be compressed to be 11024, and the string 1.. (512 times) 0... (512 times) will be compressed to 15120512. Notice that the latter compressed value is larger.

In the first one, the "information" stored in the string is simply "how many times 1 repeats". However, in the second case, there is more information, how many times 1 repeats, and how many times 0 repeats.

You can never recover lost information, so in order for decompression to work, the compression scheme outputs (as it must) a string with more bits in it.

As a side note that may be interesting to you, machine learning works exactly like this. The dataset $(1, 2), (4, 8), ...$ may be huge, but the information contained is simply the linear equation $y = 2x$. Linear regression is one algorithm that is able to compress ("learn") this dataset and output the - much tinier - linear equation.

In hashing, the constant-size output means that is an orthogonal concept to compression in these regards.

+ 1

Maeher

6/29/24, 8:44 AM

It's not technically true that all $y$ need to have multiple preimages. It is only true that at least one $y$ needs to have multiple preimages.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Technical feasibility of a theoritical compression mechanism

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.