Does compressed data expose information about non-compressed data when encrypted together?

Aaron

7/17/24, 6:06 PM

I know that compressing data before encrypting it can cause a compression oracle attack such as in the CRIME and BREACH attacks, but if only part of the data is compressed, e.g. non-user controlled and/or non-sensitive data, and the user controlled/sensitive data is not compressed, but they are then encrypted together, does the compressed portion affect the security of the non-compressed portion?

encryption(compression("mix of user controlled and sensitive data")) = vulnerable
encryption(compression("only non-user controlled and/or non-sensitive data") + "user controlled and sensitive data") = ?

1 + 0

side-channel-attack

Score:1

Crypto

Richard Thiessen

7/17/24, 9:18 PM

No, the reason for the side channel attack is that data like "username=alice&password=swordfish&evil_value=sword" compresses better together since the evil attacker controlled data sword matches part of the secret value swordfish. If then attacker guesses swordf the compressed data will stay the same size. If they guess swordg the compressed data will be larger. In this way they can find the value through repeated compression oracle calls.

Note that only the attacker cares only about the length of the compressed data which is unchanged by encryption(note:padding can complicate this but stream ciphers don't change length). So encryption isn't the problem, only whether length(plaintext) changes.

length(compress(secret || untrusted)) leaks information about secret

length(compress(secret) || compress(untrusted)) does not since it uses two independent calls to compress().

Your second example of length(compress(untrusted) || secret) also does not since the call to compress() does not include any secret data.

Security boundary aware compression

One neat way to solve these sorts of security problems would be to have a security aware compression algorithm. Choose a special flag string like "compbound" that indicates a security boundary in the compressed data and have the compression algorithm look for these boundaries and make no matches across them. It's also useful to have that string survive compression intact so that repeated compression doesn't leak information.

There's some other variants like an open/close tag that protects a small string. Useful for protecting things like CSRF tokens embedded in HTML without hurting compression of the surrounding HTML. Any secret values can be bracketed by these tags to prevent compression oracle attacks from leaking them.

+ 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Does compressed data expose information about non-compressed data when encrypted together?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.