Format Preserving Hash Function

user3336696

12/31/22, 7:44 AM

We have a use case of tokenising the credit card information and returning a tokenised value after preserving the format.

Ideally this should be one way, and following a FPE might not be the best solution. Pls suggest what best solution can be provided here.

416

1 + 0

hash

format-preserving

Score:9

Crypto

fgrieu

12/31/22, 9:21 AM

A standard credit card number has limited entropy. Per ISO/IEC 7812 the first 6 digits are fixed for a given card issuer, and the simplest prudent assumption is they are known to attackers. The last digit is a public function of the others. For 16-digit credit card numbers, that leaves 9 digits (at best <30 bit of entropy). This can be increased only slightly (like 5 bits) by using the expiry date.

This implies we can't use a fast and public function (as hashes are, unless otherwise specified) and get the desired one-wayness. And that any slow public function will have limited security. Further, the size is too small for public-key cryptography.

I conclude we can only use symmetric, secret-key cryptography. Format Preserving Encryption is a way. We could also use a reformatted Message Authentication Code, but we have to fear collisions, and it will give a slightly less robust integrity assurance than FPE.

Per request in comment, I'll detail a reformatted MAC. I assume the output must be 16 decimal digits, without other constraint. This implies we are willing to tolerate some low probability of collision: for $n$ card numbers that probability will be upper-bounded by $n\,(n-1)/(2\cdot10^{16})$; see this. For example, we must be satisfied with 99% confidence there is no collision among up to 14 million card numbers (chosen without knowledge of the key).

Assume a fixed secret key with at least 128-bit entropy
Compute HMAC-SHA-256 of the credit card number and key, with outputSize = 32 bytes
Consider the output as an integer in $[0,2^{256})$ (per say big-endian convention), reduce it modulo $10^{16}$, and express the outcome as 16 decimal digits (with leading zeroes if ncessary), forming the token.

It is essentially impossible to invert that tokenization without access to the key or to a device with the key; but that's possible with the key and moderate effort, by trial and error on credit card numbers.

If we want to additionally make it somewhat difficult to invert for one with the key, we can replace HMAC-SHA-256 with Argon2 (the credit card number being fed as password, the key as key, fixed salt, tagLength = 32 bytes), parametrized to slow down things as much as tolerable. That does not change the probability of collision, but security is improved: assuming Argon2 is parametrized for 0.1 second of computation time of a certain PC; the attacker has 100 similar PCs; 9 digits of the credit card number are totally random, 6 are known, and the last digit is a customary checksum; then it will require on average a million seconds (11.6 days) to invert a given hash, or 1 second to invert one hash among a million given hashes.

If we can't live with the collision risk, we must use true Format Preserving Encryption, but we loose on difficulty to invert for one holding the key. FPE that's significantly harder to compute in the reverse direction is not standard, but conceivable, see this question.

0 + 2

user3336696

12/31/22, 9:43 AM

What are the best ways you suggest for a reformatted MAC. Any suggests on how best you use FPE. Also, I had this weird idea , of taking the text, doing a HASH, convert to decimal and pick random digits after verifying for collision, what is your opinion on this. Appreciate your suggestions

Ilmari Karonen

12/31/22, 4:06 PM

@user3336696: If you can "verify for collision", that means you must have a database of all tokenized card numbers somewhere. If the database also includes the original card numbers (or collision-resistant hashes of them), then you can simply assign a unique random token to each card number and the problem becomes trivial (except for keeping the database secure!). If it doesn't, how will you verify whether an apparent collision is an actual collision between two different card numbers or just the same card number tokenized twice?

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Format Preserving Hash Function

TH: รูปแบบการรักษาฟังก์ชันแฮช

RO: Funcția Hash de conservare a formatului

RU: Формат, сохраняющий хеш-функцию

VI: Định dạng bảo toàn hàm băm

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.