Score:4

# Why does "hex encoding of the plaintext before encryption", "harm security"?

This arose from a recent question's comment. Why is translating to Welsh less secure than encrypting the King's English?

I'm trying to figure out what Welsh and King's English have going on here.
The comment is specific to the broken "crypto" the OP asked for review of, and does not apply to actual cryptographic ciphers.
Umm, naive implementation of hex conversion has side channel? (i.e. `nibble < 10 ? nibble + '0' : nibble + 'A' - 10`)
@Joshua They're linguistic encodings, much as hexadecimal is an 'encoding'.
Score:4

Hex encoding of the plaintext doubles it's size, while preserving entropy. Hence entropy per bit is halved.

For weak encryption algorithms (only), this typically makes attack easier. For example: the original context was XOR with a repeated key. For uniformly random key the size of a single plaintext, that is perfectly secure. But it's no longer after hex encoding.

Detailed example following comment: The cipher is XOR with a 5-bytes uniformly random key repeated as necessary to match plaintext length. Without hex encoding, given the ciphertext for a single 5-byte plaintext, we can't deduce anything about the plaintext, e.g. decide if it's for `ALICE` or `OSCAR` (in ASCII). But with (uppercase) hex encoding these become `414C494345` or `4F53434152` (in ASCII). These are 10-byte each, so that the same key byte must be reused for the first and 6th characters. These are `4` and `9` (34h and 39h) for `ALICE`, `4` and `3` (34h and 33h) for `OSCAR`. Thus we need only XOR the first and 6th bytes of ciphertext: we'll get 0Dh (34h XOR 39h) for the plaintext `ALICE`, versus 07h (34h XOR 33h) for `OSCAR`. More generally, XOR of jth and 5+jth bytes is independent of key, and characteristic enough of plaintext that's a serious issue.

Rebuttal of this comment: in the above example, the addition of hex encoding does NOT change the size of the effective plaintext, which remains the same 5 characters. That example does illustrate that all things being equal, encoding the plaintext in hex tends to weaken an already weak cipher. At the expense of a much more complex example, it would be possible to come up with a cipher with fixed-size key/password that's practically secure for arbitrarily large plaintext with the usual level of redundancy in English, but not for the increased redundancy of hex-encoded text.

Is this still true if the (secure) encryption scheme is randomized?
@Titanlord: CPA-secure encryption algorithms remain secure when their plaintext is hex-encoded, hence the "(only)" in the answer. But it's not enough to make an encryption algorithm randomized (that is with IV) to make it CPA-secure, and we can craft a randomized algorithm that is CPA secure for short random plaintext but not after this same plaintext is hex-encoded, e.g. XOR with a repeated key obtained by block encryption of a random IV.
This is a case of "if yous encryption scheme is already horrifically broken, this will exhibit the fact that it's horrifically broken". I very much disagree with the claim that the security of any reasonable encryption scheme can be impacted by transcoding the plaintext before encryption.
@Maeher: I hope I'm not culprit of making such claim when I wrote in a comment : _"this is the several-times-pad, which is well-known to be extremely weak; plus hex encoding of the plaintext before encryption, which rather harms security, and is impractical."_ My own assertion (or at least intent) was that in the several-times-pad, adding hex encoding of the plaintext rather harms security.
With respect fgieu, you’re repeating yourself without details. _"this typically makes attack easier..."_ doesn’t really explain why encoded/translated plain texts are less secure in some instances. Are you suggesting that there is a relationship between information density (entropy) and the cipher algorithm? This would suggest the possibility of a differentiator for encrypted languages, and pose the seemingly confounding question “which plain text language is the most secure?” Can you present some numbers? I ask as I'm a fan of hexadecimal.
Simply said, if the cipher is analyzed easily then having more information available to verify correct guesses of the key makes it easier to get to the actual key or plaintext. This is not just true for classical ciphers but also e.g. ciphers / schemes that allow for plaintext oracles to happen. For instance, XML encryption using CBC can be broken by analyzing the XML structure instead of the padding, so even having a decrypt that is invulnerable to padding oracle attacks won't help the relying parties if they leak information about the validity of the XML.
your attack also works if you know the message is ALICEWASHERE or OSCARWASNTME or any other messages longer than 5 bytes, and you don't hex-encode them. Or if you use a 1-byte key and don't hex-encode them.
@MaartenBodewes: If it helps; I wrote a hex decoder awhile back that isn't a validating hex decoder. Every even number of bytes turns into something by bitbashing the unhex (with some out-of-range values).
Then again, you mentioned a key _the size of a single plaintext_, and if that held while the plaintext was that that 10 byte hex-encoded string, it would still be ok. Switching from a one-time-pad to a... umm, a two-time-pad is the issue here, not the hex encoding.
I sit in a Tesla and translated this thread with Ai: