Score:1

How to compose multiple values to MAC authenticate?

sr flag

If the data I want to authenticate consists of multiple values and I compute a MAC simply concatenating the values, an adversary can "shift" characters within those values without invalidating the MAC. How is this issue commonly and best addressed?

I have found this existing question about MACing multiple messages, but I feel the proposed solutions do not generalize well for more than two messages.

Consider the following contrived example:

Suppose I have a server that stores authentic log entries for clients. The client writes a log entry, authenticates it using a MAC and sends it to the server. Later, when the client retrieves log entries from the server, it should be able to verify their authenticity.

Let's say log entries have the following structure:

{
  createdAt: "1621012345",
  message: "first entry"
}

Naively I could create a MAC for a log entry l as $$ mac = \text{HMAC}(K, l.createdAt \| l.message) $$ where $\|$ denotes concatenation and $K$ is the secret key.

If I were to go ahead to store this log entry and MAC on a server and retrieve it later, the server could return

{
  createdAt: "1621012",
  message: "345first entry",
  mac: "<the MAC computed above>"
}

Since 1621012 || 345first entry is the same as 1621012345 || first entry I would not notice the manipulation when checking the MAC.

Note that in this case I should actually detect the manipulation by validating then length of createdAt. But that only works if the length is fixed and not if I had, say, authorName instead of the timestamp.

I can think of the following ways of dealing with this:

1. Intersperse a delimiter

If I calculated my MAC as $mac = \text{HMAC}(K, l.createdAt \| \text{':'} \| l.message)$ I believe this attack would not be possible anymore. At first glance it seems problematic that the delimiter character can appear in the message. But that only makes it impossible to unambiguously reconstruct the values from the concatenated string, which is irrelevant in this scenario. I cannot think of any way to make the calculation of the MAC ambiguous here. Is this simple solution secure?

2. Hash values before concatenating

I could calculate the MAC, for example, as $mac = \text{HMAC}(K, \text{SHA256}(l.createdAt) \| \text{SHA256}(l.message))$ (or any other cryptographic hash function). This ensures that an adversary cannot meaningfully manipulate the values I concatenate. It also ensures that the concatenated values always have a fixed length. Does the hashing add any value compared to the first idea?

3. Authenticate the whole structured data

I could also calculate the MAC over the complete JSON object of the log entry. Effectively, this means I have more elaborate, meaningful delimiters (the keys and syntax). Basically, like a JWT. Note that this approach also has some downsides, which mostly boil down to loss of API flexibility and the need to canonicalize the JSON. There's a great blog post about this at https://latacora.micro.blog/2019/07/24/how-not-to.html.


Am I missing any good solutions for this problem? Is there a recommended way to deal with this?

poncho avatar
my flag
One alternative solution would be to prepend each entry with its length; e.g. $HMAC_k( len(createdAT) | createdAT | len(message) | message )$. Of course, you need some canonical way of converting length into a byte string; e.g. always convert it into a 4 byte little-endian value...
leftfold avatar
sr flag
Good point! I forgot this one. I read [that ethereum used to do that](https://medium.com/mycrypto/the-magic-of-digital-signatures-on-ethereum-98fe184dc9c7) and then changed to `32 || Keccak256(message)` (since the hash is always 32 bytes). Unfortunately the motivation of this change is not clear to me from the article.
poncho avatar
my flag
The other thing you need to include in your MAC are the field labels; if the attacker is able to convert "createdAT: time" to "deleteBY : time", well, that's something we'd want to detect...
leftfold avatar
sr flag
True, although depends on the use case and how the client does the validation. I imagine when the client computes the MAC for validation, it looks for `l.createdAt` and uses the that value for the MAC. If there's no field `createdAt` then the object it got from the server is already invalid. (`deletedBy` is potentially ignored then.) That said, if we want to consider the whole JSON as one "document", then we should include the labels. I believe this is very close to canonicalizing and authenticating the whole JSON as outlined in the 3rd idea.
Maarten Bodewes avatar
in flag
For log entries I would MAC the canonicalized messages separately, but I would include a counter. You'll want to have centralized logging anyway, and I would not want to include a sub-nanosecond counter or performing tricks such as waiting at least a nanosecond before continuing. If synchronization is an issue you could include a service ID or something similar. Databases are good at this kind of stuff.
Score:0
sr flag

To partly answer my own question here: idea 1 is in fact not secure.

It does become problematic if the delimiter appears in the value after all. Consider using , as a delimiter for the values Hello, and World. Interspersed with , this gives Hello,,World, where the first comma is part of the message and the second is the delimiter. Unfortunately, the values Hello and ,World yield the same message. So the intuition that the composed message should be parsable seems to hold after all.

As pointed out by @poncho in the comments, this could be fixed by prepending the length of the field to each field, instead of using delimiters.

Personally I'll probably go with the second version (hashing the values before concatenating). But I'd still be interested if anyone else has come accross this and how you've solved it.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.