Score:0

Crypto

Very small domains in FF1 or similar

Jeff Burdges

7/18/24, 8:04 PM

I want a family of efficiently computable random-ish bijections $f_{k,n} : [0..n) \to [0..n)$ for $n \le 2048$. We cannot make the $f_{k,n}$ be secure for encryption at this domain size of course, but I really only need the set $f_{k,n}\big( [0..n/3) \big)$ to be distributed somewhat randomly. In particular, those set overlaps should've expected size $n / 3^j$ for $j$ distinct random $k$.

I could simply shuffle $[0..n)$ of course, but format preserving encryption vaguely like FF1 sounds like a better choice for cache locality and convenience. It appears FF1 itself cannot handle a strings of length one though.

I'll presumably need some other generalized Feistel algorithm adapted to small domains, but which divides the domain algebraically, no?

2 + 0

format-preserving

Score:2

Crypto

aiootp

7/28/24, 10:24 PM

You can use a modified affine cipher chain. It's the type of design I developed initially when trying to find a way to scramble database IDs, or name-based / time-based GUIDs without introducing birthday collisions.

Given that you want to express all non-negative integers smaller than $n = 2048$, $n$ is the exclusive maximum, and $m = n-1$ is the inclusive maximum. All those numbers can be expressed in $L = |m|$ number of bits, which in this case $L = 11$ bits. Then, find the first prime number $P_n \ge 2^{L}$, here that would be $P_n = 2053$.

Choose a uniform, random key $k \in \{0, 1\}^{7 \cdot L}$ and slice it into seven equal $L$-size, non-overlapping chunks $k_0$, $k_1$, $k_2$, $k_3$, $k_4$, $k_5$, $k_6$, and convert each of those keys into non-negative integers. Now, $k_3$ and $k_5$ (the multiplication keys) do also need to be non-zero, so you can resample randomly from $\{0, 1\}^L$ until those two don't end up converting into zeros. Then the random-ish, bijective permutation $f_{k,n}(x)$ will be defined like this:

$$\forall k' \space (k' \gt 0)$$ $$g_{k',k''}(x) = y = (k' \cdot x + k'') \mod P_n$$ $$G_{k',k''}(x) = \begin{cases} G_{k',k''}(y) & \text{if } y \ge 2^L \\ y & \text{otherwise} \end{cases}$$ $$f_{k,n}(x) = G_{k_5,k_6}(G_{k_3,k_4}(x \oplus k_0) \oplus k_1) \oplus k_2$$

Because the prime number $P_n$ is larger than the desired range, we have to shrink the range using a recursive definition for $G_{k',k''}(x)$. It uniquely swaps an element $x$ such that $g_{k',k''}(x) \leftrightarrow y$ where $x \lt 2^L$ and $P_n \gt y \ge 2^L$, with another element $x' = y$ such that $g_{k',k''}(x') \leftrightarrow y'$ where $P_n \gt x' \ge 2^L$ and $y' \lt 2^L$.

For small domains, it's probably worthwhile to increase $C$, the chain of calls to $G_{k',k''}(x)$ within $f_{k,n}(x)$. That's if a keyspace larger than $77$ bits is desired, or a better chance at good random-ish scrambling. The formula $\space |k| = (3 \cdot C + 1) \cdot L \space$ can determine how long a string of bits $k$ will need to be for a chain of calls of length $C$. In my testing on larger domains (where $n \ge 2^{128}$), a chain of length $C \ge 2$ was enough to produce random looking outputs, and break up the obvious bit patterns that are present from only a single call to $G$.

This solution is:

Efficiently computable: Processing each input $x$ would cost about three xors, and an average of two multiplications, two additions and two modulo reductions.
Bijective: It might not be obvious, but each input $x$ will map to a unique output $y$. It's easier to see by considering if each operation is bijective. If each operation is bijective, then the chain of operations is bijective.
Small in area: It would take about 77 bits to store the key $k$, and 12 bits to store the prime $P_n$. In total that would be 89 bits.
Simple to implement: The capacity to perform xor, multiplication and addition modulo a number is ubiquitous. The most challenging operation would be changing the domain of your problem and then determining the next prime number $P_n \ge 2^L$.
Length preserving: Each permutation result $f_{k,n}(x)$ will fit in the same amount of bits defined by the inclusive maximum number $m$ in your problem domain.

As long as you don't need actual encryption, this could help.

+ 2

J_H

7/30/24, 10:27 PM

You defined G as either G... or y. I think you meant it should be either g... or y?

aiootp

7/30/24, 11:46 PM

@J_H $g_{k',k''}(x) = y$; whereas, $G_{k',k''}(x)$ is shorthand for the conditional function that outputs $y$ if $y \lt 2^L$ (is inside of the domain) or calls itself recursively (cycle walks) on its outputs until $y \not \ge 2^L$ (doesn't fall outside of the domain).

Score:1

Crypto

poncho

7/18/24, 8:38 PM

Actually, I'd advise you to reconsider the "shuffle $[0,n)$ solution; such a permutation table (packed) would fit in 2816 bytes (for $n \le 2048$); that's well within the L1 dcache on most non-lowend CPUs (and well within the L2 cache). And, I would personally suspect that a full cache miss (of both the L1 and the L2 cache) would actually be cheaper than performing an FF1 encryption (which invokes AES 10 times) even with AES-NI would end up taking more cycles.

Or, is the times taken to set up the permutation a concern? Obviously, generating a 2048 element random shuffle is rather more expensive than, say, selecting a random AES key.

In addition, the current NIST guidance is to not use FF1 for domains $< 1000000$; I don't know if they have any specific weakness in mind, or whether it would impact your relatively modest goals, however that is something to keep in mind.

On the other hand:

It appears FF1 itself cannot handle a strings of length one though.

That is true; however those are strings of digits; if you select a base of 2 (and so a digit consists of a single bit), FF1 can function perfectly well (with a string of 11 digits), albeit not following the NIST guidance mentioned above.

+ 2

Jeff Burdges

7/19/24, 12:15 PM

AES 10 times per letter? I guess per chunk of letters, but still that's lots. AES merely doing key sequence set up could be cached of course, assuming the devs use that option. We'd only need a full AES or ChaChas for every 16 shuffles, if optimized better than rand::shuffle right now. I'd need to make our devs cache the table though. lol

poncho

7/19/24, 12:29 PM

@JeffBurdges: 10 times per FF1 evaluation (for permutations smaller than 256 bits, which you obviously are). I have no idea what you're referring to with "a full AES/ChaCha [once] every 16 shuffles"

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Very small domains in FF1 or similar

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.