I've done some reading since posting this question and I now have a better grasp on this, so I'll self-answer.
The confusion arises from the difference between zero-knowledge and honest-verifier zero knowledge. Consider the verifier $V^0$ which will generate a random $m$, encrypt it to generate a ciphertext $c$ and send it over the channel to $P$. $P$ (assumed to also be honest) will then respond with $m$. This interaction is trivially zero-knowledge. Intuitively, $V^0$ has learned nothing, since they already knew $m$ - formally we can build a simulator which creates a false transcript of the proof by generating a random $m$ and encrypting it. However, $V^0$ is the honest verifier - it is a verifier acting according to the protocol.
Consider the following malicious verifier $V^*$: it does not perform any encryptions and will simply always send a set ciphertext, $c^*$. Now something is learned - the decryption of $c^*$, which was not known before since $c^*$ is not the result of a known encryption, but the result of specific choice. The definition of zero-knowledge stipulates that for all verifiers there must exist a simulator. $V^*$ cannot be simulated without knowledge of the secret, since it is easy to check whether the decryption is correct by re-encrypting it, so the correct decryption of $c^*$ must be known by the simulator, in which case the simulator must be able to decrypt arbitrary ciphertexts, which is not possible without access to the secret; thus the scheme is in fact not zero-knowledge, regardless of the cryptosystem used (unless the cryptosystem is broken and therefore a simulator can decrypt arbitrary messages in polynomial time, in which case the proof is moot). Note here that it is essential $V^*$ sends a fixed ciphertext every time - if $c^*$ is picked at random for each interaction, the transcript can easily be falsified.
It's not a part of the original question, but I think it's important to bring up - this is not a valid proof-of-knowledge, which I intended it to be. In fact (denoting by $S_\mathcal{M}$ this scheme implemented with some cryptosystem $\mathcal{M}$), "$S_\mathcal{M}$ is a proof-of knowledge" $\implies$ "decryption oracle access to $\mathcal{M}$ allows private-key exfiltration". This is an immediate consequence of the lack of a commitment stage as would be used in a $\Sigma$-protocol. A proof-of-knowledge requires that we can write an extractor $E$ which can retrieve the secret from $P$ if allowed to rewind its state. However, $P$ as described here is stateless - regardless of which interaction or when in the interaction an input is sent, the output is the same - thus the existence of an efficient extractor for $P$ immediately implies that $\mathcal{M}$ is completely broken with decryption oracle access. This is merely an interactive proof that proves (with $\epsilon$, the margin for error, hazily defined) that $P$ can decrypt arbitrary messages - it does not satisfy the requirements to prove private key knowledge.