How to process a message to be embedded using steganography?

Martin Benes

7/24/24, 8:15 PM

Let's say we embed a text using steganography by modifying an existing cover object. What would be the steps needed to be done on the message?

I can think of

source coding (compression)
channel coding (adding redundancy)
encryption

Do I forget anything? What methods would you use for each step? And how the efficiency of the steps differ when I change the message size?

1 + 1

steganography

Maarten Bodewes

7/24/24, 11:51 PM

That would entirely depend on the method used for steganography. If you split the message in bits and then embed the bits then the message itself may be left unaltered. In short: to hide a message in another object you may not have to change the message itself.

Score:1

Crypto

fgrieu

7/25/24, 10:26 AM

Compression is important in steganography: the less data, the easier it's to hide it's existence. The nature of compression depends on the kind of data: text (as in the question) could perhaps be preprocessed to remove unnecessary features (e.g. turned to capitals) and should be followed by some lossless compression. Not relevant here: There are specialized compression method for audio, with some specialized for speech achieving very high compression, image, video.

Compression must always be chronologically before encryption, which makes later compression impossible.

"Adding redundancy" can have two purposes:

Detecting alteration of the message, including deliberate. That would be thru digital signature, or symmetric authentication. In the later case, that's best integrated into the encryption, which would be authenticated encryption. Examples include AES-GCM, ChaCha20-Poly1305.
Allowing recovery from partial alteration. That's Forward Error Correction. If needed, that would be separate from what's discussed above. It logically comes after encryption (and for many kinds of encryption, including those discussed above, that can work only there).

The need for FEC and it's characteristics depends so heavily on how data is transmitted (here, on the steganography used), that it would often be considered part of that data transmission or steganography.

"Efficiency" in term of CPU usage is typically not an issue for text steganography, especially for the preparatory steps described. CPU usage would typically be about proportional to text size, and if more that would be for the compression.

The hard part is the steganography itself! Public methods can readilly achieve that without knowledge of the key, it can't be distinguished a transmission with hidden content from one without when both transmissions are prepared with said method. But most public methods are brittle when it comes to hiding that the method is used and suitable for steganography. Often it's not even attempted: there are countless articles around on LSB steganography methods which output, at the byte level, is trivially recognizable from what a mobile phone's camera outputs.

Update per comment:

By efficiency, I meant for instance, e.g., compression does not make sense for very short messages.

Text compression ratio can often be enhanced by using an agreed-upon dictionary (e.g. of common words), and that greatly helps make compression efficient for short messages with a good match for the dictionary. That's prominently featured in ztd. The venerable zlib does it with deflateSetDictionary. In other libraries it can often be hacked by compressing training text before the actual payload, with a flush in-between, ignoring the output before the flush.

+ 1

Martin Benes

7/25/24, 12:58 PM

Thank you for your answer! By efficiency, I meant for instance, e.g., compression does not make sense for very short messages.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: How to process a message to be embedded using steganography?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.