Score:0

One way text→text function

jp flag

I need a way to map some printable text to other printable text. E.g.:

Ian BoydKcp Zbas

Notice some of the important requirements:

  • uppercase is uppercase in output
  • lowercase in input is lowercase in input
  • spaces (and anything else outside of A-Z0-9) are left alone

The additional requirement is that it be deterministic, that is the same input always gives the same output:

  • Ian BoydKcp Xbas
  • Ian BoydKcp Xbas
  • Ian BoydKcp Xbas

The other requirement expands on determinism, and i don't know what to call it except to say that words with common prefixes needs to have the same output for the same common prefix:

  • IK
  • IaKc
  • IanKcp
  • Ian Kcp
  • Ian BKcp X
  • Ian BoKcp Xb
  • Ian BoyKcp Xba
  • Ian BoydKcp Xbap

My Solution

I created a solution to these technical requirements 15 years ago; but i'm trying to revisit it some something "better".

My solution was a "a streaming Caesar cipher with chaining".

Create a simple caesar substitution:

1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
F  H  U  D  I  R  Y  E  K  C  Z  N  S  A  B  G  J  P  V  O  W  T  L  M  Q  X

But rather than a simple substitution:

  • Ian BoydFks Hbqd

instead i used chaining:

Previous State  Current Character  Sum (mod)   Next character  output
--------------  -----------------  ----------  --------------  --------------------
0               I (9)              9           K               K
9               a (1)              10          c               Kc
10              n (14)             24          p               Kcp

24              B (2)              26          X               Kcp X
26              o (15)             15          b               Kcp Xb
15              y (25)             14          a               Kcp Xba
14              d (4)              18          m               Kcp Xbap

Better solution with hashing?

These values are not something i need (or want) to "decrypt", and the use of a caesar cipher implies the ability to decrypt (which we all know isn't that hard).

So conceptually i really want a "one-way function": something that:

  • converts input
  • to some unpredictable output
  • deterministically

I thought: what if i used a hash algorithm, adding letters on by one, and getting the current "state", and convert that partial digest to a character (uppercase/lowercase/digit as appropriate to match the input):

   String FrobTheGrobber(String input)
   {
      //proof of concept pseudocode that only handles uppercase
      HashAlgorithm hash = new SHA256();
      hash.AddBytes(SECRET_KEY);

      String res = "";

      for Char ch in input
      {
         hash.Add(ch);
         int charCode = (hash.Hash[0] % 26) + 1; //use first byte, mod 26 to get a value from 0..25
         res += Char(Ord('A') + charCode;
      }            
   }

I know; you hate the requirements.

Can anyone thing of anything better?

DannyNiu avatar
vu flag
Keying the Caesar cipher and make it work like a Davies-Meyer compression function?
DannyNiu avatar
vu flag
The prefix-idempotence property make it hard *not* to reverse the mapping.
fgrieu avatar
ng flag
Does the prefix property imply that if `I` → `K`, then `i` → `k`, or could it be that `i` → `z`? That if `Ian → Kcp ` then `Ian Ian` → `Kcp Kcp`, or could it be that `Ian Ian` → `Kcp Zar`?
jp flag
@fgrieu For simplicity it was *"case insensitive"*, that is `i` and `I` both map to `k` and `K`. The full solution also includes `0`-`9` in the Caesar cipher. This way things like phone numbers map to something that also looks like a phone number ( `905-867-5309` → `619-112-8408`). And i could also map lowercase separately from uppercase. Either way: doesn't matter. If someone can come up with a solution for `A`-`Z`, i can expand it to any other characters.
SAI Peregrinus avatar
si flag
How would you handle i and ι which both map to I? Just ignore Turkey? Case-insensitivity is almost always a TERRIBLE idea, it's always lossy.
jp flag
@SAIPeregrinus The question does not need those details. For the purposes of this question you only have to solve handling `A-Z`. Answer that, and i will handle the rest. The rest is trivial to handle, and completely irrelevant to my question. If it helps you: pretend the source system uses a 5-bit code page, which only defines the characters `A`-`Z`. Having said all that, you know **exactly** how i would handle the turkey test - but it's not part of my question because it's not important to the question, nor is it the question i'm asking.
jjj avatar
cn flag
jjj
With the prefix requirement it can't be one-way, because one can easily reconstruct it character for character.
Ievgeni avatar
cn flag
I do not understand which are the security requirements?
Score:1
ng flag

Problem with the question's original solution is that the substitution table is the same at each index, and the better part of the key. It can thus be determined from examples. Problem with the "better" solution is that knowing an initial I encrypts to K, we know an initial A encrypts to C, an initial B encrypts to D, etc..

I propose the following, with an at-least-256-bit hash $H$ such as SHA-256, and key $K$:

  • set $y=\mathtt{\text{‘0’}}$
  • for each character $x$ to encipher or decipher
    • let $b=0$
    • if $x$ is a digit, let $b=10$ and let $c=\mathtt{\text{‘0’}}$
    • if $x$ is an uppercase letter, let $b=26$ and let $c=\mathtt{\text{‘A’}}$
    • if $x$ is a lowercase letter, let $b=26$ and let $c=\mathtt{\text{‘a’}}$
    • if $b\ne0$
      • set $K=H(K\mathbin\|\operatorname{uppercase}(y)\mathbin\|b)$
      • prepare a permutation $p$ of $b$ elements keyed by the current value of $K$; and towards this:
        • set integer $r=K$ (per say big-endian convention)
        • for $i=0$ to $b-1$
          • set $j=r\bmod(i+1)$, then $r=\lfloor r/(i+1)\rfloor$
          • set $p[i]=i$
          • set $p[i]=p[j]$
          • set $p[j]=i$
      • if enciphering
        • set $y=x$
        • set $x=p[x-c]+c$
      • otherwise
        • set $x=p^{-1}[x-c]+c$
        • set $y=x$
    • output $x$

And this gives the results:

Input text Scrambled
I B
Ia Bf
Ian Bfd
Ian Bfd
Ian B Bfd K
Ian Bo Bfd Kf
Ian Boy Bfd Kft
Ian Boyd Bfd Kftv
Ian Boyd Bfd Kftv
Ian Boyd 2 Bfd Kftv 2
Ian Boyd 20 Bfd Kftv 25
Ian Boyd 201 Bfd Kftv 257
Ian Boyd 2017 Bfd Kftv 2571

Try It Online! in Python.

The $K=H(K\mathbin\|\operatorname{uppercase}(y)\mathbin\|b)$ step prepares a new key that depends on the previous one, on the previous character (normalized to uppercase, because it's unclear if capitalization can have any influence for next characters), and on if the current character is a digit or not (because the rules allow, and we want to depend on as much as they allow). At each step, $256-\log_2(26!)>167.6$ bit of the state remain unknown to attacker even in a chosen plaintext attack. We built a permutation of the appropriate size according to the current character, in the direction suitable for encryption or decryption.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.