Score:3

Asymmetric encryption scheme with the shortest output for encrypting 1 byte of information

cn flag

Imagine that one needs to periodically encrypt very short messages (i.e., a boolean Yes/No, a single byte, or 3-4 bytes in the worst case). We assume that there is no session, and we just need to encrypt under a receiver's public key (i.e., posting an encrypted byte to the blockchain).

I'm aware of ElGamal, Cramer-Shoup, RSA, ECIES encryption etc. and I'm looking for the algorithm with the shortest output when encrypting tiny amount of information.

I want to maintain at least 128bits of security, thus wondering if an ElGamal variant, optimized for short messages, exists. We know that ElGamal over elliptic curves outputs 64 bytes ciphertexts (assuming we encrypt a message with length up to 32 bytes when using a 256bit elliptic curve).

Is there an asymmetric scheme in the literature that could output less bytes than this? ideally 30-32 bytes?

Score:3
ng flag

Out of my head, I propose this EC-ElGamal variation on a standard 256-bit Elliptic Curve, using hash-and-XOR for the encryption part (nearly halving ciphertext size over textbook EC-ElGamal), and a simple work/space trade-off to further squeeze into a 32-byte ciphertext a message $m$ of $b$ bits, with small fixed $b$ (e.g. $b=8$).

  • Use a standard Elliptic Curve over field $\mathbb F_p$ with generator $G$ of order $n$, with $p$ and $n$ 256-bit primes and (conjectured) 128-bit security. secp256r1 (aka NIST P-256) and secp256k1 qualify. I refer to Sec1v2 for ECC math and notation.
  • Key generation:
    • Choose private key $d$ randomly in the interval $[1,n)$
    • Compute public key $Q\gets d\,G$
  • Encryption of $b$-bit plaintext $m$
    • Choose $e$ randomly in the interval $[1,2^{32})$
    • Compute $F\gets e\,G$
    • Choose $k$ randomly in the interval $[1,n)$
    • Compute $R\gets k\,G$
    • While $R_x\not\equiv0\pmod{2^b}$
      • Compute $R\gets R+F$ and $k\gets k+e$, which maintains $R=k\,G$
      • If $k\ge n$ the RNG that chose $k$ is most likely broken; abort or restart encryption
    • If $R_y$ is odd, set $R_y\gets p-R_y$ and $k\gets n-k$, which maintains $R=k\,G$
    • Compute $S\gets k\,Q$ (the shared secret point)
    • Compute $u\gets \operatorname{SHA-256}(S_x)\bmod2^b$, where $S_x$ is expressed as a 32-byte bytestring
    • Compute $v\gets u\oplus m$
    • Output ciphertext of 256-bit, consisting of $256-b$ bits for $R_x$ with the low-order $b$ bits removed (they are zero by construction), and $b$ bits for $v$
  • Decryption:
    • From the ciphertext extract $R_x$ (inserting $b$ low-order bits at zero) and $v$
    • Compute an $R_y$ matching $R_x$ (that's a standard technique to undo point compression, see Sec1v2 §2.3.4, step 2.4.1); if that fails, abort decryption.
    • If $R_y$ is odd, set $R_y\gets p-R_y$
    • Compute $S\gets d\,R$ (the shared secret point)
    • Compute $u\gets \operatorname{SHA-256}(S_x)\bmod2^b$, where $S_x$ is expressed as a 32-byte bytestring
    • Compute and output the deciphered plaintext $m\gets u\oplus v$

I conjecture IND-CCA1 (thus IND-CPA) security to 128-bit level, under plausible assumptions (hardness of a variant of the decisional Diffie-Hellman problem for the Elliptic Curve, and SHA-256 modeled as a random oracle). There is a trivial attack against IND-CCA2 (thus access to a decryption oracle compromises the confidentiality of past messages even if the decryption oracle has blacklisted the original ciphertexts; this is a non-issue in practice).

Beware that the cipher is trivially malleable. This allows an adversary to flip desired bits in the plaintext by altering these bits in the ciphertext. This can be a practical issue. Some level of mitigation of that is possible by replacing $u\oplus\ldots$ by Format Preserving Encryption with a wider $u$ as key; or/and more work or/and space.

The while loop at encryption is a time/space trade-off, searching $k$ by enumeration so that it's $x$ coordinate $R_x$ has $b$ low-order bits at zero that need not be transmitted. The search is optimized to require $\approx2^b$ point additions, but for $b$ starting about $8$ this is going to become a pain. Use of secret increment $e$ and matching $F=e\,G$ is a nice-to-have making the final $k$ more uniform (see note), but I know no attack if we take $e=1$ and thus $F=G$. Secret increment further could help mitigate side-channel attacks.

If we want to avoid some or all of that time/space trade-off, we can

  • trivially, increase the ciphertext size.
  • or/and venture into generating a custom curve, with bit size of $n$ and $p$ reduced by a few bits from 256. For each 1 bit of security we give up, we gain about 2 for $n$ and $p$, thus on ciphertext bits, or 4 times less guesswork in the while loop. And for secp256r1 the best known attacks arguably require $>2^{140}$ bit operations (based on a conservative extrapolation of a similar claim of $>2^{140}$ for Ed25519).

Note: In textbook Diffie-Hellman or ElGamal, we choose $k$ uniformly in $[0,n)$. Our variant restricts to $k$ with $k\,G$ having an $x$ coordinate with it's low-order $b$ bits at zero. By a statistical argument, there are about $n/2^b$ such $k$. We'd want to choose $k$ uniformly in this subset of $[0,n)$. The simplest would be repeating for random $k$, but that has prohibitive computational cost for the point multiplication $R\gets k\,G$. We save on this by moving from one candidate $k$ to the next by a fixed increment $e$, allowing to update $R$ by a single point addition $R\gets R+F$.

If we used fixed $e=1$, then the $k$ selected by the while loop would have a distinguishable property (for one knowing $k$): the probability that a $k$ is selected is proportional to $j$ computable from $k$ as the smallest $j>0$ such that $k-j$ is in the subset (or $k+1$ if there's no such $j$, which occurs for a singe $k$ in the subset). The problem is similar to the non-uniform choice of prime obtained by picking a random starting point and searching for the next prime.

Choosing any public fixed $e$ does not solve the issue, because the defintion of $j$ can be adapted to $k-e\,j$ in the subset. However, choosing a random secret $e$ for each choice of $k$ does. A similar technique is used in random prime generators for RSA. I have no reference, but an argument is that an adversaries not knowing $e$ do not know what $e$ to use for a test; they can test for all $e$ and combine the outcomes, but then their test is overwhelmed with noise.

I picked the upper limit of the interval $[1,2^{32})$ for $e$ so that the computation $F\gets e\,G$ has marginal expected cost compared to $R\gets k\,G$. I am confident (without evidence or reference) this is more than enough to make the bias in the selection of $k$ practically undetectable even for hypothetical adversaries knowing $k$; and actual adversaries could only get $k$ thru a side channel.


Final thought: we could damp any fear that the criteria $R_x\equiv0\pmod{2^b}$ creates a weakness by adjusting that to: the low $b$ bit of $R_x$ are some function (having about uniform distribution) of the other bits of $R_x$. A $b$-bit CRC would do. The receiver can reconstruct the necessary low-order bits of $R_x$ by applying this function.

Kostas Kryptos avatar
cn flag
Thanks for the answer @fgrieu, is there a reason on why you are not retrying k in R = kG until Rx = 0 and you introduce e?
fgrieu avatar
ng flag
@Kostas Chalkias: Indeed, I do not generate a fresh random $k$ when $R_x\not\equiv0\pmod{2^b}$, and instead step $k$ by $e$. That's for performance reasons. I manage to explore a new $R$ with a single point addition, rather than hundreds addition or doubling (for a point multiplications) if I used a new random $k$. Using $(1,G)$ rather than $(e,F)$ would work fine from this standpoint, but that makes the choice of $(k,R)$ measurably non-uniform among those with $R_x\equiv0\pmod{2^b}$, and using $(e,F)$ improves a lot on that.
knaccc avatar
es flag
Please could you clarify what the subscripts x and y mean, e.g. as in
Kostas Kryptos avatar
cn flag
@knaccc: it's and coordinates of the elliptic curve point .
Kostas Kryptos avatar
cn flag
@fgrieu excellent answer, can you elaborate a bit more (even with some reference paper) why (,) is better that (1,) re uniformity? Also, is the range of up to 2^32 ideal or we could get better uniformity results if = [1, n) (assuming we could tolerate the extra cost of 1 big scalar to point mul)?
fgrieu avatar
ng flag
@Kostas Chalkias: I added a note explaining the issue, and an argument about why I think it's solved by $(e,F)$.
Kostas Kryptos avatar
cn flag
@fgrieu: thanks again, PS: malleability is not a problem if submitting the ctx as part of a blockchain transaction (all transactions are signed anyway, thus integrity is guaranteed).
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.