Out of my head, I propose this EC-ElGamal variation on a standard 256-bit Elliptic Curve, using hash-and-XOR for the encryption part (nearly halving ciphertext size over textbook EC-ElGamal), and a simple work/space trade-off to further squeeze into a 32-byte ciphertext a message $m$ of $b$ bits, with small fixed $b$ (e.g. $b=8$).
- Use a standard Elliptic Curve over field $\mathbb F_p$ with generator $G$ of order $n$, with $p$ and $n$ 256-bit primes and (conjectured) 128-bit security. secp256r1 (aka NIST P-256) and secp256k1 qualify. I refer to Sec1v2 for ECC math and notation.
- Key generation:
- Choose private key $d$ randomly in the interval $[1,n)$
- Compute public key $Q\gets d\,G$
- Encryption of $b$-bit plaintext $m$
- Choose $e$ randomly in the interval $[1,2^{32})$
- Compute $F\gets e\,G$
- Choose $k$ randomly in the interval $[1,n)$
- Compute $R\gets k\,G$
- While $R_x\not\equiv0\pmod{2^b}$
- Compute $R\gets R+F$ and $k\gets k+e$, which maintains $R=k\,G$
- If $k\ge n$ the RNG that chose $k$ is most likely broken; abort or restart encryption
- If $R_y$ is odd, set $R_y\gets p-R_y$ and $k\gets n-k$, which maintains $R=k\,G$
- Compute $S\gets k\,Q$ (the shared secret point)
- Compute $u\gets \operatorname{SHA-256}(S_x)\bmod2^b$, where $S_x$ is expressed as a 32-byte bytestring
- Compute $v\gets u\oplus m$
- Output ciphertext of 256-bit, consisting of $256-b$ bits for $R_x$ with the low-order $b$ bits removed (they are zero by construction), and $b$ bits for $v$
- Decryption:
- From the ciphertext extract $R_x$ (inserting $b$ low-order bits at zero) and $v$
- Compute an $R_y$ matching $R_x$ (that's a standard technique to undo point compression, see Sec1v2 §2.3.4, step 2.4.1); if that fails, abort decryption.
- If $R_y$ is odd, set $R_y\gets p-R_y$
- Compute $S\gets d\,R$ (the shared secret point)
- Compute $u\gets \operatorname{SHA-256}(S_x)\bmod2^b$, where $S_x$ is expressed as a 32-byte bytestring
- Compute and output the deciphered plaintext $m\gets u\oplus v$
I conjecture IND-CCA1 (thus IND-CPA) security to 128-bit level, under plausible assumptions (hardness of a variant of the decisional Diffie-Hellman problem for the Elliptic Curve, and SHA-256 modeled as a random oracle). There is a trivial attack against IND-CCA2 (thus access to a decryption oracle compromises the confidentiality of past messages even if the decryption oracle has blacklisted the original ciphertexts; this is a non-issue in practice).
Beware that the cipher is trivially malleable. This allows an adversary to flip desired bits in the plaintext by altering these bits in the ciphertext. This can be a practical issue. Some level of mitigation of that is possible by replacing $u\oplus\ldots$ by Format Preserving Encryption with a wider $u$ as key; or/and more work or/and space.
The while loop at encryption is a time/space trade-off, searching $k$ by enumeration so that it's $x$ coordinate $R_x$ has $b$ low-order bits at zero that need not be transmitted. The search is optimized to require $\approx2^b$ point additions, but for $b$ starting about $8$ this is going to become a pain. Use of secret increment $e$ and matching $F=e\,G$ is a nice-to-have making the final $k$ more uniform (see note), but I know no attack if we take $e=1$ and thus $F=G$. Secret increment further could help mitigate side-channel attacks.
If we want to avoid some or all of that time/space trade-off, we can
- trivially, increase the ciphertext size.
- or/and venture into generating a custom curve, with bit size of $n$ and $p$ reduced by a few bits from 256. For each 1 bit of security we give up, we gain about 2 for $n$ and $p$, thus on ciphertext bits, or 4 times less guesswork in the while loop. And for secp256r1 the best known attacks arguably require $>2^{140}$ bit operations (based on a conservative extrapolation of a similar claim of $>2^{140}$ for Ed25519).
Note: In textbook Diffie-Hellman or ElGamal, we choose $k$ uniformly in $[0,n)$. Our variant restricts to $k$ with $k\,G$ having an $x$ coordinate with it's low-order $b$ bits at zero. By a statistical argument, there are about $n/2^b$ such $k$. We'd want to choose $k$ uniformly in this subset of $[0,n)$. The simplest would be repeating for random $k$, but that has prohibitive computational cost for the point multiplication $R\gets k\,G$. We save on this by moving from one candidate $k$ to the next by a fixed increment $e$, allowing to update $R$ by a single point addition $R\gets R+F$.
If we used fixed $e=1$, then the $k$ selected by the while loop would have a distinguishable property (for one knowing $k$): the probability that a $k$ is selected is proportional to $j$ computable from $k$ as the smallest $j>0$ such that $k-j$ is in the subset (or $k+1$ if there's no such $j$, which occurs for a singe $k$ in the subset). The problem is similar to the non-uniform choice of prime obtained by picking a random starting point and searching for the next prime.
Choosing any public fixed $e$ does not solve the issue, because the defintion of $j$ can be adapted to $k-e\,j$ in the subset. However, choosing a random secret $e$ for each choice of $k$ does. A similar technique is used in random prime generators for RSA. I have no reference, but an argument is that an adversaries not knowing $e$ do not know what $e$ to use for a test; they can test for all $e$ and combine the outcomes, but then their test is overwhelmed with noise.
I picked the upper limit of the interval $[1,2^{32})$ for $e$ so that the computation $F\gets e\,G$ has marginal expected cost compared to $R\gets k\,G$. I am confident (without evidence or reference) this is more than enough to make the bias in the selection of $k$ practically undetectable even for hypothetical adversaries knowing $k$; and actual adversaries could only get $k$ thru a side channel.
Final thought: we could damp any fear that the criteria $R_x\equiv0\pmod{2^b}$ creates a weakness by adjusting that to: the low $b$ bit of $R_x$ are some function (having about uniform distribution) of the other bits of $R_x$. A $b$-bit CRC would do. The receiver can reconstruct the necessary low-order bits of $R_x$ by applying this function.