Found the answer.
First, I'll change notation a bit so the equations become a bit more symmetric.
Using this notation, the plaintext word $PH$ in Matsui's paper becomes $x_0$, and $PL$ becomes $x_1$. The ciphertext word $CH$ becomes $x_{n+1}$, and $CL$ becomes $x_n$.
We can find an approximation to $n$ rounds via a graph search. A node in this graph will look like this:
$$
\bigoplus_{i=0}^{n+1} x_i[\delta_i] = \bigoplus_{i=1}^n k_i[\epsilon_i] \tag{1}
$$
where $\delta_i$ and $\epsilon_i$ are bitmasks.
To see what the edges are, say we have a 1-round approximation like this:
$$
x[\alpha] \oplus F(x, k)[\beta] = k[\gamma] \tag{2}
$$
We can instantiate it at round, say, $i$, to get:
$$
x_i[\alpha] \oplus F(x_i, k_i)[\beta] = k_i[\gamma] \tag{3}
$$
and now we can use the Feistel structure, knowing that $x_{i+1} = x_{i-1} \oplus F(x_i, k_i)$, to rewrite $F(x_i, k_i)$ as $x_{i+1} \oplus x_{i-1}$, and so:
$$
x_i[\alpha] \oplus \left(x_{i+1} \oplus x_{i-1}\right)[\beta] = k_i[\gamma] \tag{4}
$$
This will be an edge in our graph. That is, for all linear approximations, and for all instantiation rounds $i$, we can create such an edge. If the source of the edge is equation $(1)$, the target of the edge is the following equation, obtained by $\operatorname{XOR}$ing $(1)$ and $(4)$:
$$
\bigoplus_{j=0}^{i-2} x_j[\delta_j] \oplus x_{i-1}[\delta_{i-1} \oplus \beta] \oplus x_i[\delta_i \oplus \alpha] \oplus x_{i+1}[\delta_{i+1} \oplus \beta] \bigoplus_{j=i+2}^{n+1} x_j[\delta_j] = k_i[\epsilon_j \oplus \gamma] \bigoplus_{j=1}^n k_j[\epsilon_j] \tag{5}
$$
An $n$-round approximation is such a node like $(1)$, where the masks $\delta_2 \dots \delta_{n-1}$ are all $0$. That is, the equation only involves plaintexts, ciphertexts, and keys.
We start our graph search with a trivial approximation, where $\forall i, \delta_i = 0$, and $\forall i, \gamma_i = 0$. To see what edges we will actually take, a little bit of nomenclature:
- A state $x_i$ is said "hidden" when $1 < i < n$. That is, it's neither a plaintext nor a ciphertext. A mask $\delta_i$ is called "hidden" under the same conditions.
We will only take an edge $e$, that takes us from a node $v: \bigoplus_{i=0}^{n+1} x_i[\delta_i] = \bigoplus_{i=1}^n k_i[\epsilon_i]$ to a node $w: \bigoplus_{i=0}^k x_i[\delta'_i] = \bigoplus_{i=1}^k k_i[\epsilon'_i]$, when at most 2 hidden masks in $w$ are nonzero.
We reach the end state when all hidden masks are zero, and we're not at the start node.
We use the piling-up lemma to compute weights as we go, which we can do in logspace to improve precision.
Below is C++ source code to replicate Matsui's table of results in the appendix. I've used Dijkstra's algorithm for the graph search, but really it's overkill, a dynamic programming solution would also do. This is because the only paths we care about are increasing in the location where they apply one-round approximations (i.e. they start with the empty approximation, and apply it at rounds, say, 1, 2, 3, 5, 6, 8, 9, 10, and reach the final state). Dijkstra's already runs instantly, however, so no need to overthink it.
The only DES-specific thing here is the one-round approximations one_round_approximations
. Modifying that makes this find linear chains for any Feistel network, given approximations to the round function.
For NUM_ROUNDS = 10
, this code outputs:
Best approximation:
round_number: 10
state masks = [1: [7, 18, 24, 29], 10: [15], 11: [7, 18, 24, 29]]
key masks = [1: [22], 2: [44], 3: [22], 5: [22], 6: [44], 7: [22], 9: [22]]
applied approximations: [-ACD-DCA-A]
Probability: 0.500047
Log2(Bias): -14.3904
which exactly matches Matsui's paper.
// Finds chains of linear approximations in a Feistel cipher.
//
// A round of a Feistel cipher can be described as:
// x0 x1
// | k |
// | | |
// v v |
// + <- F ---
// | |
// v v
// x2
//
// Where + is bitwise xor, and F is a keyed permutation function. Algebraically,
// x2 = x0 + F(k, x1) (1)
// The wire that carries x1 remains unmodified, and will be swapped with x2
// before the next round in the Feistel network. The whole network then looks
// like this, where ===><== means we swap the two wires:
// x0 x1
// | k1 |
// | | |
// v v |
// + <- F ---
// | |
// v v
// x2 x1
// | |
// x1===><==x2
// | |
// | k2 |
// | | |
// v v |
// + <- F ---
// | |
// v v
// x3 x2
// | |
// ...
//
//
// A one-round approximation to F is three bitmasks, alpha, beta, gamma, such
// that
// x[alpha] + F(x, k)[beta] = k[gamma]
// holds with some probability p. Here the notation a[m] means that we xor the
// bits in the bit string a, indicated by the mask m. So a[0b101] means
// ((a & 100) >> 2) ^ (a & 1).
//
// We note that equation (1) tells us that F(k, x1) = x2 ^ x0. In general, if we
// look at the i-th round, equation (1) tells us that
// F(x_{i + 1}, k_{i + 1}) = x_{i + 2} + x_i (2)
//
// Thus, if we have an approximation like:
//
// x[alpha] + F(x, k)[beta] = k[gamma]
//
// We can instantiate it at any given round, for example:
//
// x1[alpha] + F(x1, k1)[beta] = k1[gamma]
//
// And then use equation (2) to rewrite this as:
//
// x1[alpha] + (x2 + x0)[beta] = k1[gamma]
//
// In this way we obtain equations for the values of the wires in the Feistel
// network. In general, they are always going to be of the form:
//
// x_{i + 1}[alpha] + (x_{i + 2} + x_i)[beta] = k_{i + 1}[gamma] (3)
//
// Our goal is to start with a very simple equation:
//
// x0[0] + x1[0] = 0
//
// which holds with probability 1, and applying these equations we got, reach an
// equation that involves only:
// * x0, the high word in the plaintext.
// * x1, the low word in the plaintext.
// * x_{n+1}, the high word in the ciphertext.
// * x_n, the low word in the ciphertext.
// * Some round keys k_i.
// and we want to know the probability with which they hold. We call this sort
// of equation a linear approximation of the full cipher.
//
// This program considers a graph where each node is of the form:
// (m_0, m_1, ..., m_{N+1}, km_0, km_1, ..., km_0, p)
// where N is the number of rounds, each m_i is a 32-bit bitmask, each km_i is a
// 64-bit bitmask, and p is a probability. The meaning of this node is:
//
// with probability p,
// (\sum_{i=0}^{N + 1} x_i[m_i]) + (\sum_{i=0}^{N-1} k_i[km_i]) = 0
//
// where x_i are the values of the wires in the Feistel network, k_i are the
// round keys, m_i are bitmasks for the x_i, and km_i are masks for the k_i.
//
// Starting at the node where m_i = 0 forall i, and km_j = 0 forall j, and p =
// 1, we want to reach a state that represents a linear approximation of the
// full cipher.
//
// The edges of this graph are going to be using some 1-round approximation,
// instantiated at some round in the network. For instance, if we have the node
//
// x_0[0b101] + x_1[0b11] = k_1[0b110] (4)
//
// and we know the equation of the form (3):
// x_{i+1}[0b1011] + (x_{i + 2} + x_i)[0b11] = k_{i + 1}[0b101]
//
// we could instantiate this at i = 1, to get
//
// x_2[b1011] + (x_3 + x_1)[0b11] = k_2[0b101] (5)
// If we then xor this equation (5) with (4), we get
//
// x_0[0b101] + x_2[0b1011] + x_3[0b11] = k_1[0b110] + k_2[0b101]
//
// which is another node in our graph. In this way we explore the graph until we
// reach a linear approximation of the full cipher.
#include <array>
#include <cstdint>
#include <iostream>
#include <set>
#include <unordered_map>
#include <cassert>
#include <vector>
#include <cmath>
#include <optional>
constexpr size_t NUM_ROUNDS = 10;
// Shows a 64-bit bitmask, showing only the bit indices that are "on".
std::ostream& show_mask(std::ostream& o, uint64_t m) {
int i = 0;
bool first = true;
o << "[";
while (m) {
if (m & 1) {
if (!first) {
o << ", ";
} else {
first = false;
}
o << i;
}
m >>= 1;
++i;
}
o << "]";
return o;
}
// The meaning of this approximation is that with probability p = probability(),
// x[alpha] + F(x, k)[beta] = k[gamma]
// For any 32-bit string x, and 48-bit string k.
//
// The bias is defined as |probability() - 0.5|.
struct OneRoundApproximation {
const char* name;
uint32_t alpha;
uint32_t beta;
uint64_t gamma; // Only need 48 bits.
double log2_bias; // log_2(bias)
double probability() const {
double x = std::pow(2.0, log2_bias) + 0.5;
return std::max(x, 1.0 - x);
}
friend auto operator<=>(const OneRoundApproximation&,
const OneRoundApproximation&) = default;
friend std::ostream& operator<<(std::ostream& o,
const OneRoundApproximation& ra) {
o << ra.name;
return o;
}
};
// This is an approximation that relates some plaintext bits, some ciphertext
// bits, some key bits, and possibly some hidden state bits, in a linear way,
// using fixed bit masks.
//
// The first 2 states are the 2 plaintext words, the last 2 states are the 2
// ciphertext words, and every other state is a hidden state - it is the value
// of a wire in the Feistel network.
struct Approximation {
std::array<uint32_t, NUM_ROUNDS + 2> state_mask;
std::array<uint64_t, NUM_ROUNDS> round_key_mask;
std::array<std::optional<OneRoundApproximation>, NUM_ROUNDS>
applied_approximations;
int round_number;
friend auto operator<=>(const Approximation&, const Approximation&) = default;
friend std::ostream& operator<<(std::ostream& o, const Approximation& a) {
o << "round_number: " << a.round_number << std::endl;
o << "state masks = [";
int cnt = 0;
for (size_t i = 0; i < NUM_ROUNDS + 2; ++i) {
if (!a.state_mask[i]) continue;
if (cnt++ > 0) o << ", ";
o << i << ": ";
show_mask(o, a.state_mask[i]);
}
o << "]" << std::endl;
cnt = 0;
o << "key masks = [";
for (size_t i = 0; i < NUM_ROUNDS; ++i) {
if (!a.round_key_mask[i]) continue;
if (cnt++ > 0) o << ", ";
o << i << ": ";
show_mask(o, a.round_key_mask[i]);
}
o << "]" << std::endl;
o << "applied approximations: [";
for (size_t i = 0; i < NUM_ROUNDS; ++i) {
auto ma = a.applied_approximations[i];
if (ma == std::nullopt) {
o << "-";
} else {
o << *ma;
}
}
o << "]" << std::endl;
return o;
}
};
// This is an approximation that holds with a given probability.
struct WeightedApproximation {
Approximation a;
double log2_bias;
double probability() const {
double x = std::pow(2.0, log2_bias) + 0.5;
return std::max(x, 1.0 - x);
}
};
std::array<OneRoundApproximation, 5> one_round_approximations = {{
{"A", 0x8000, 0x21040080, 0x400000, std::log2(std::abs(12.0/64.0 - 0.5))},
{"B", 0xd8000000, 0x8000, 0x6c0000000000ULL, std::log2(std::abs(22.0/64.0 - 0.5))},
{"C", 0x20000000, 0x8000, 0x100000000000ULL, std::log2(std::abs(30.0/64.0 - 0.5))},
{"D", 0x8000, 0x1040080, 0x400000, std::log2(std::abs(42.0/64.0 - 0.5))},
{"E", 0x11000, 0x1040080, 0x880000, std::log2(std::abs(16.0/64.0 - 0.5))}
}};
// Given an existing approximation, what happens if we xor a one-round
// approximation onto it? Specifically, that one-round approximation will be
// instantiated at round `position`.
Approximation apply_one_round_approximation(const Approximation& a,
const OneRoundApproximation& o,
size_t position) {
Approximation b = a;
b.round_number = position + 1;
b.state_mask[position] ^= o.beta;
b.state_mask[position + 1] ^= o.alpha;
b.state_mask[position + 2] ^= o.beta;
b.round_key_mask[position] ^= o.gamma;
b.applied_approximations[position] = o;
return b;
}
// Returns how many hidden states have a nonzero mask in the given
// approximation.
size_t HiddenSize(Approximation& b) {
size_t result = 0;
for (size_t i = 2; i < NUM_ROUNDS; ++i) {
result += b.state_mask[i] != 0;
}
return result;
}
std::vector<WeightedApproximation> Transitions(
const WeightedApproximation& a, const OneRoundApproximation& o) {
// For a given approximation of the first j rounds of the cipher, we can apply
// the round approximation either at the current state in the Feistel network,
// or the next one. That is, a one-round approximation of the form:
// alpha * x_{i + 1} + beta * (x_{i+2} + x_i) = gamma * k_i
// Can be xorred onto the current state, which is of the form:
//
// \sum_{k=0}^N state_mask_k * x_k + \sum_{k=0}^N key_mask * key_i = 0
//
// Where the sum is the xor of every bit in the argument.
//
// This will make some state_masks zero, and some nonzero. We only want to
// take the transition when the number of nonzero masks for hidden states is
// at most 2, because this is all that's needed to make progress in the
// Feistel networks we've seen (DES). A state is hidden when it's neither x_1,
// x_2, x_{N-1}, or x_N. That is, when it's neither one of the two plaintext
// words, nor one of the two ciphertext words.
// This can be made faster and more space efficient, but I don't particularly
// care for now.
// We search for a round where to instantiate the one-round approximation.
std::vector<WeightedApproximation> result;
for (size_t i = a.a.round_number; i < NUM_ROUNDS; ++i) {
Approximation b = apply_one_round_approximation(a.a, o, i);
if (HiddenSize(b) > 2) continue;
result.push_back(
{.a = std::move(b),
.log2_bias = 1 + a.log2_bias + o.log2_bias});
}
return result;
}
std::ostream& operator<<(std::ostream& o, const WeightedApproximation& wa) {
o << wa.a << std::endl;
o << "Probability: " << wa.probability() << std::endl;
o << "Log2(Bias): " << wa.log2_bias << std::endl;
return o;
}
// Just a standard hash function to be able to put Approximations in a hash
// table.
template <class T>
inline void hash_combine(std::size_t& seed, const T& v) {
std::hash<T> hasher;
seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
size_t HashApproximation(const Approximation& a) {
size_t h = 0;
for (size_t i = 0; i < NUM_ROUNDS + 2; ++i) {
hash_combine(h, a.state_mask[i]);
}
hash_combine(h, a.round_number);
return h;
};
int main() {
// We'll use Dijkstra's algorithm to explore this graph. The weight of an edge
// is the log2 of the bias of the one-round approximation it represents.
auto compare_by_probability = [](const WeightedApproximation& a,
const WeightedApproximation& b) {
if (a.log2_bias > b.log2_bias) return true;
if (a.a.round_key_mask < b.a.round_key_mask) return true;
return HashApproximation(a.a) < HashApproximation(b.a);
};
// This is the queue of nodes we still have to visit.
std::set<WeightedApproximation, decltype(compare_by_probability)> queue;
// This tells us which nodes have already been visited. Note in queue we store
// the nodes with a weight, whereas this hash stores just the approximation
// itself, without a probability. This is to be able to modify the estimated
// weights of each approximation as we traverse the graph, as per Dijkstra's
// algorithm.
std::unordered_map<Approximation, decltype(queue.begin()),
decltype(&HashApproximation)>
seen(1, &HashApproximation);
// Our initial approximation has no masks, and holds with probability 1, and
// so its bias is 1 - 0.5 = 0.5..
WeightedApproximation wa = {.a = Approximation{},
.log2_bias = std::log2(0.5)};
auto it = queue.insert(wa).first;
seen.emplace(wa.a, it);
// We keep track of the best approximation we've seen so far. The only
// approximations we'll care about are the ones that relate plaintexts,
// ciphertexts, and key bits. For this, they must have a round number of
// NUM_ROUNDS. This means `wa` is actually not a valid best approximation to
// the full cipher, and will be overwritten the first time we find any valid
// approximation.
WeightedApproximation best_approximation = wa;
while (!queue.empty()) {
WeightedApproximation v = queue.extract(queue.begin()).value();
// We signal it's been popped off the queue.
seen[v.a] = queue.end();
// If this is an approximation to the full cipher (i.e. all NUM_ROUNDS of
// it), and does not involve any hidden states (i.e. only plaintexts,
// ciphertexts, and key bits), then it's a good candidate for the best
// linear approximation for the whole cipher. We'll keep the most likely
// one, that is the one with the highest bias.
if (v.a.round_number == NUM_ROUNDS && HiddenSize(v.a) == 0) {
if (best_approximation.a.round_number == 0 ||
best_approximation.log2_bias < v.log2_bias) {
best_approximation = v;
}
continue;
}
// We now traverse all edges outgoing from approximation`v`, by listing all
// possible one-round approximations we could apply at this approximation.
for (const auto& o : one_round_approximations) {
for (WeightedApproximation w : Transitions(v, o)) {
auto it = seen.find(w.a);
if (it == seen.end()) {
// If we've never seen this approximation before, add it to the queue,
// and mark it as seen. This must be a new element in the queue, and
// we assert so.
auto [jt, inserted] = queue.insert(std::move(w));
assert(inserted);
seen.emplace(jt->a, std::move(jt));
} else {
// We've seen it before. If we've already popped it out of the queue,
// no need to do anything, we already found the shortest weight path
// to it, by the invariant of Dijkstra's algorithm..
if (it->second == queue.end()) continue;
// Relax the edge if possible.
if (it->second->log2_bias < w.log2_bias) {
queue.erase(it->second);
auto jt = queue.insert(w).first;
it->second = jt;
}
}
}
}
}
std::cout << "Best approximation: " << std::endl
<< best_approximation << std::endl;
}
```