Score:0

Deterministic data masking

jp flag

We are building out data masking framework mainly to mask PIIs. Our scale is pretty large, and masking will be done at ingest time, so we want the masking to be done in a very performant manner. Some of the constraints we have are that we would like the masking to be deterministic and reversible. I have looked at AES encryption to encrypt PII, especially AES SIV, on my macbook, it takes around ~2 milliseconds, which may not be ideal for our scale.

Would be great to hear from the community if there are any alternatives to AES SIV which are faster (and deterministic), or if there are any other alternatives to AES encryption.

Here is my encryption method. I am using the Cryptodome.Cipher AES


    def encrypt(key: bytes, text: str) -> str:
        nonce = None
        cipher = None
        encoded_text = text.encode('utf-8')

        nonce = get_random_bytes(AES.block_size)
        cipher = AES.new(key, AES.MODE_SIV, nonce=nonce)
        cipher_text, tag = cipher.encrypt_and_digest(encoded_text)

        ret_cipher_text = nonce + cipher_text
        ret_cipher_text = tag + ret_cipher_text
        return b64encode(ret_cipher_text).decode()

The performance of this encryption ~2 millisecond for a small text.

poncho avatar
my flag
2 msec to AES encrypt a moderately small text? Surely, the CPUs on macbooks aren't that bad - I suspect you're using a bad (low performance) AES encryptor...
Paul Uszak avatar
cn flag
The phrase _"deterministic_" appears three times here. What is your specific concern?
jp flag
@poncho - Yes, small text. I am using Cryptodome.Cipher AES. Are there faster alternatives? Possibly miscreant?
jp flag
@PaulUszak - I want to know if AES SIV encryption performance is indeed sub-optimal, and if there are any alternative ways to mask data such that the same plaintext gets masked to the exact same cipher-text and can be converted back to the plaintext.
Paul Uszak avatar
cn flag
_"or if there are any other alternatives to AES encryption."_ : You find find that (other than http://tls) there are no alternative encryptions from a marketing perspective. AES or bust.
Paul Uszak avatar
cn flag
Are you sure you want this: _"the same plaintext gets masked to the exact same cipher-text "_ ? This tells all that nothing has changed and is considered poor form.
Score:2
us flag

The issue here is most likely Python and not AES.

With hardware support (AES-NI) AES can generally be computed at less than ~1 CPU cycle / byte if you have sufficiently many independent AES tasks (e.g. with counter mode) and at about 2-5 CPU cycles / byte if not (e.g. with CBC mode).

AES-SIV now effectively works by chaining a CBC-like operation with its output being used as the initial counter to counter-mode. Thus, the expected performance is about 3-6 cycles per byte for optimized implementations.

Two millisconds on a 1GHz CPU (yours probably clocks higher) are about 2 million CPU cycles. Even assuming a quite bad implementation on an older CPU with AES-NI a "bad" implementation will hit around 20-30 cycles / byte, but nowhere close to 2 million.

So the issue is most likely related to the interpreter having to do way more than with a compiled / optimized interpretation.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.