Score:4

Why does "hex encoding of the plaintext before encryption", "harm security"?

cn flag

This arose from a recent question's comment. Why is translating to Welsh less secure than encrypting the King's English?

Joshua avatar
cn flag
I'm trying to figure out what Welsh and King's English have going on here.
R.. GitHub STOP HELPING ICE avatar
cn flag
The comment is specific to the broken "crypto" the OP asked for review of, and does not apply to actual cryptographic ciphers.
DannyNiu avatar
vu flag
Umm, naive implementation of hex conversion has side channel? (i.e. `nibble < 10 ? nibble + '0' : nibble + 'A' - 10`)
Paul Uszak avatar
cn flag
@Joshua They're linguistic encodings, much as hexadecimal is an 'encoding'.
Score:4
ng flag

Hex encoding of the plaintext doubles it's size, while preserving entropy. Hence entropy per bit is halved.

For weak encryption algorithms (only), this typically makes attack easier. For example: the original context was XOR with a repeated key. For uniformly random key the size of a single plaintext, that is perfectly secure. But it's no longer after hex encoding.


Detailed example following comment: The cipher is XOR with a 5-bytes uniformly random key repeated as necessary to match plaintext length. Without hex encoding, given the ciphertext for a single 5-byte plaintext, we can't deduce anything about the plaintext, e.g. decide if it's for ALICE or OSCAR (in ASCII). But with (uppercase) hex encoding these become 414C494345 or 4F53434152 (in ASCII). These are 10-byte each, so that the same key byte must be reused for the first and 6th characters. These are 4 and 9 (34h and 39h) for ALICE, 4 and 3 (34h and 33h) for OSCAR. Thus we need only XOR the first and 6th bytes of ciphertext: we'll get 0Dh (34h XOR 39h) for the plaintext ALICE, versus 07h (34h XOR 33h) for OSCAR. More generally, XOR of jth and 5+jth bytes is independent of key, and characteristic enough of plaintext that's a serious issue.

Rebuttal of this comment: in the above example, the addition of hex encoding does NOT change the size of the effective plaintext, which remains the same 5 characters. That example does illustrate that all things being equal, encoding the plaintext in hex tends to weaken an already weak cipher. At the expense of a much more complex example, it would be possible to come up with a cipher with fixed-size key/password that's practically secure for arbitrarily large plaintext with the usual level of redundancy in English, but not for the increased redundancy of hex-encoded text.

Titanlord avatar
tl flag
Is this still true if the (secure) encryption scheme is randomized?
fgrieu avatar
ng flag
@Titanlord: CPA-secure encryption algorithms remain secure when their plaintext is hex-encoded, hence the "(only)" in the answer. But it's not enough to make an encryption algorithm randomized (that is with IV) to make it CPA-secure, and we can craft a randomized algorithm that is CPA secure for short random plaintext but not after this same plaintext is hex-encoded, e.g. XOR with a repeated key obtained by block encryption of a random IV.
cn flag
This is a case of "if yous encryption scheme is already horrifically broken, this will exhibit the fact that it's horrifically broken". I very much disagree with the claim that the security of any reasonable encryption scheme can be impacted by transcoding the plaintext before encryption.
fgrieu avatar
ng flag
@Maeher: I hope I'm not culprit of making such claim when I wrote in a comment : _"this is the several-times-pad, which is well-known to be extremely weak; plus hex encoding of the plaintext before encryption, which rather harms security, and is impractical."_ My own assertion (or at least intent) was that in the several-times-pad, adding hex encoding of the plaintext rather harms security.
Paul Uszak avatar
cn flag
With respect fgieu, you’re repeating yourself without details. _"this typically makes attack easier..."_ doesn’t really explain why encoded/translated plain texts are less secure in some instances. Are you suggesting that there is a relationship between information density (entropy) and the cipher algorithm? This would suggest the possibility of a differentiator for encrypted languages, and pose the seemingly confounding question “which plain text language is the most secure?” Can you present some numbers? I ask as I'm a fan of hexadecimal.
Maarten Bodewes avatar
in flag
Simply said, if the cipher is analyzed easily then having more information available to verify correct guesses of the key makes it easier to get to the actual key or plaintext. This is not just true for classical ciphers but also e.g. ciphers / schemes that allow for plaintext oracles to happen. For instance, XML encryption using CBC can be broken by analyzing the XML structure instead of the padding, so even having a decrypt that is invulnerable to padding oracle attacks won't help the relying parties if they leak information about the validity of the XML.
jp flag
your attack also works if you know the message is ALICEWASHERE or OSCARWASNTME or any other messages longer than 5 bytes, and you don't hex-encode them. Or if you use a 1-byte key and don't hex-encode them.
Joshua avatar
cn flag
@MaartenBodewes: If it helps; I wrote a hex decoder awhile back that isn't a validating hex decoder. Every even number of bytes turns into something by bitbashing the unhex (with some out-of-range values).
ilkkachu avatar
ws flag
Then again, you mentioned a key _the size of a single plaintext_, and if that held while the plaintext was that that 10 byte hex-encoded string, it would still be ok. Switching from a one-time-pad to a... umm, a two-time-pad is the issue here, not the hex encoding.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.