Score:1

What tools are there to reverse-engineer an LFSR besides the BMA?

cn flag

I have a certain timecode which I can’t seem to figure out. We gave successfully decoded other codes for the same purpose with the Berlekamp-Massey algorithm, but this code seems to have a linear complexity of 110, which is not practical in any way. It can also not be reconstructed with the 110-bit irreducible polynomial the BMA finds. Just the first few bits are as expected and then the bits seem to flip, that is they are the reciprocal of the expected bits.

I am a beginner in cryptography and have no idea how to proceed at this point. Are there other algorithms to analyze an LFSR (or similar pseudo-random sequences)? Or does anyone have clue under what conditions the BMA detects such an utterly long polynomial?

kodlu avatar
sa flag
if it does have a linear complexity of 110, only 220 bits (if not in error) should suffice via the BMA. I am assuming it is a binary sequence, you did not state this.
neolith avatar
cn flag
Yes, it is a binary sequence or GF(2) as mathematicians say. Well, actually we don't know, but I have analysed the sequence under that assumption, because the other sequences we have decoded successfuly were also binary.
Score:2
us flag

It is not clear what exactly the "certain time code" is, but I will take it to mean a finite-length sequence $s_0, s_1, \ldots, s_{N-1}$ of $N$ bits. The sequence itself may or may not have been generated by an LFSR -- we don't know -- but we would like to find a (Fibonacci) LFSR that generates the sequence.

Let's set the stage a bit. A Fibonacci LFSR of length $n$ is an array of $n$ bits with initial value $$(s_0, \quad s_1,\quad \ldots, \quad s_{n-2}, \quad s_{n-1})$$ and successive values $$(s_0, \quad s_1, \quad \ldots, \quad s_{n-2}, \quad s_{n-1})\\ \downarrow\\ (s_1, \quad s_2, \quad \ldots, \quad s_{n-1}, \quad s_n)\\ \downarrow\\ (s_2, \quad s_3, \quad \ldots, \quad s_n, \quad \quad s_{n+1})\\ \downarrow\\ \cdots \quad \cdots$$ where the array contents shift leftwards at each step (clock cycle) so that the bits falling off the left end (the LFSR output) are the given sequence, while the bits shown as entering the LFSR on the right are being computed as a linear combination (a.k.a. Exclusive-OR sum) of some of the LFSR contents at the previous step. In particular, for $i \geq 0$, the state transition can be expressed as $$\biggr(s_i, \quad s_{i+1}, \ldots, \quad s_{i+n-2}, \quad s_{i+n-1}\biggr)\\ \downarrow\\ \biggr(s_{i+1}, \quad s_{i+2}, \ldots, \quad s_{i+n-1}, \quad {\Large{\oplus}}_{j=0}^{n-1}c_{n-j} s_{i+j}\biggr),$$ where the $c_i$'s have value $0$ or $1$, so that the linear combination $c_ns_i\oplus c_{n-1}s_{i+1} \oplus \cdots \oplus c_1s_{i+n-1}$ of previous LFSR contents is being fed back into the right end of the LFSR as $s_{n+i}$ is actually the Exclusive-OR sum of a selected few bits of the previous contents (those for which the corresponding $c_i$ have value $1$). Hence the name linear feedback shift register which is initialized into LFSR.

With the above description in mind, one answer to the question of an LFSR is trivial in one sense: an $N$ stage LFSR with feedback coefficients $(c_N,c_{N-1}, \cdots, c_1) = (1,0,0,\cdots, 0)$ and initial loading $\big(s_0, s_1, \ldots, s_{N-1}\big)$ will produce endless repetitions $$s_0, s_1, \ldots, s_{N-1},s_0, s_1, \ldots, s_{N-1}, s_0, s_1, \ldots, s_{N-1}, \cdots$$ of the given time code $s_0, s_1, \ldots, s_{N-1}$ for all eternity. Alternatively, any other choice of $(c_N,c_{N-1}, \cdots, c_1)$ will fill the shift register with something else, but the first $N$ bits out will still be $s_0, s_1, \ldots, s_{N-1}$ and what follows thereafter will be different. This is clearly not what we want, and so we seek a better criterion: find the shortest LFSR (and its initial loading) that will generate $s_0, s_1, \ldots, s_{N-1}$, and that is exactly what the Berlekamp-Massey algorithm finds.

The Berlekamp-Massey algorithm is an iterative process that consists of finding the shortest LFSR that generates the first $t$ bits $s_0, s_1, \ldots s_{t-1}$ and checking if the LFSR finds $s_t$ correctly. If so, $s_{t+1}$ is checked, while if not, the $c_i$'s are updated so that the revised LFSR generates $s_t$. Often, but not always, the length of the LFSR increases. The iterative process continues until the last available bit $s_{N-1}$ has been processed, so that when the Berlekamp-Massey algorithm terminates, the LFSR synthesized produces the entire sequence $s_0, s_1, \ldots, s_{N-1}$ at its output. If the synthesized LFSR has $n\leq N$ stages, then its initial loading is $\big(s_0, s_1, \ldots, s_{n-1}\big)$ (which are the first $n$ output bits) and the LFSR correctly calculates $s_n, s_{n+1}, \cdots, s_{N-1}$ (that is, the rest of the given sequence). The length of the LFSR synthesized by the Berlekamp-Massey algorithm is guaranteed to be minimal: no LFSR that generates $s_0, s_1, \ldots, s_{N-1}$ can have shorter length. However, there is no claim that $n$ is guaranteed to be smaller than $N$; it might well the case that the trivial solutions described above are minimal solutions. Indeed, I argue that this is indeed the case for most sequences; the LFSR is of length $N$. Kolmogorov complexity theory says that the shortest program for printing out most strings of length $N$ is of length $N+c$, or colloquially, for most strings, one can do little better than just store the string and print it out; the "$c$" is just the length of the "Print the following string" command. In the context of shift registers, for most sequences of length $N$, the shortest shift register whose output is a given sequence of length $N$ is just the shift register of length $N$ initialized to the given sequence. What the feedback is (if there is any feedback at all) is irrelevant.

  • If the sequence was indeed created by a $n$-stage LFSR as described above, where $n \leq \frac N2$, then the Berlekamp-Massey algorithm finds all $n$ coefficients $c_i$ after examining $s_0, s_1, \ldots, s_{2n-1}$. The initial loading of the synthesized LFSR is, of course, just $\big(s_0, s_1, \ldots, s_{n-1}\big)$. From this point onwards, the Berlekamp-Massey algorithm can continue checking the rest of the sequence (if so desired) but does not need to update the $c_i$'s because each check of the next symbol simply reports back that everything is OK; the current LFSR generates the the next bit too. However, in general, the cryptanalyst has no idea as to whether the time code was generated by an LFSR at all; it is merely a sequence of $N$ bits of unknown origin, and so for the cryptanalyst's purposes, all $N$ bits must be examined to find the shortest LFSR that can generate $s_0, s_1, \ldots, s_{N-1}$. If additional bits $s_N, s_{N+1}, \ldots$ are availaible, then there is no guarantee that the LFSR found after examining $N$ bits will generate these additional bits too. In fact, the whole idea behind the Berlekamp-Massey algorithm is that if the test "Does the current LFSR also generate $s_n$?" fails, the algorithm synthesizes a modified LFSR from the current LFSR so that the modified LFSR generates $s_0, s_1, \ldots, s_{n-1},s_n$. In this process, the length of the LFSR might (or might not) increase.
  • If the sequence was indeed created by a $n$-stage LFSR as described above, where $\frac N2 < n \leq N$, then the Berlekamp-Massey algorithm finds the shortest LFSR that generates $s_0, s_1, \ldots, s_{N-1}$. It might, or it might not, generate the same values for $s_N, s_{N+1}, \cdots, s_{2n-1}$ as the actual LFSR does.

With the above lengthy diatribe in mind, let us address the OP's questions. In the title, the OP asks

What tools are available to reverse-engineer an LFSR besides the BMA?

Well, the extended Euclidean algorithm for polynomial greatest common divisors can be used, but in a sense, it is equivalent to the Berlekamp-Massey algorithm; it can be viewed as processing the sequence in reverse order $s_{N-1}, s_{N-2}, \cdots, s_1, s_0$ in comparison to the order $s_0, s_1, \cdots, s_{N-2}, s_{N-1}$ used by the Berlekamp-Massey algorithm.

In the body of the question, the OP complains

this code seems to have a linear complexity of 110, which is not practical in any way. It can also not be reconstructed with the 110-bit irreducible polynomial the BMA finds. Just the first few bits are as expected and then the bits seem to flip.

It is likely that this issue is a matter of mistakes in the implementation of the BMA. The "not practical in any way" suggests that perhaps insufficient memory was allocated for storing the various polynomials used internally in the BMA or that buffers overflowed etc. As @kodlu points out, if the time code is of length $220$ bits, then examining all $220$ bits could result in an LFSR of length $110$. Else, as discussed above, if the time code is of length $110$ or more (maybe $128$ bits $= 16$ bytes?), the BMA might well synthesize an LFSR of length $110$. Finally, I have no explanation other than programmer error for the OP's observation that in the synthesized LFSR output, "first few bits" are correct but the rest are flipped.

does anyone have clue under what conditions the BMA detects such an utterly long polynomial?

See the material above the dashed line in this answer.

neolith avatar
cn flag
First of all, thank you for this extensive answer. I have some questions. "Often, but not always, the length of the LFSR increases.". I was looking at a detailed example of the BMA in a paper. I am remember that sometimes added bits to the register are being discarded again. Can the 110 bits also decrease, given that there are enough error-free bits of the sequence fed into the BMA? If not, for the case that there is not enough memory, this would mean that the linear complexity could only increase furhter, which still unpractical.
neolith avatar
cn flag
A prof of my Uni had the idea that his could be a shrinking generator and suggested an attack in a paper of Golic. I am not sure whether to go down this road yet. I think we will first check the algorithm for errors. It is from a 32-bit era and it could indeed be the case. The next step would be to implement the algorithm in NumPy, where all memory errors are impossible. If all of this fails, could it be that the BMA falsely detects an irreducable polynomial or is that impossible, given that the implementation is correct?
Score:0
ng flag

All LFSRs can be reverse-engineered with a correct, long-enough example output and a correct implementation of Berlekamp-Massey.

But not all pseudo-random sequences similar in purpose to pseudo-random sequences that are generated by an LFSR are also generated by an LFSR! An example is discussed in this question.

If something uses good cryptography, even if the algorithm is public, there is no way to find the key and predict the sequence from examples.

neolith avatar
cn flag
Yes, this is the conclusion we were coming too. If the code is encrypted - which is not practical from a performance standpoint, but the developer would have had some good reasons to - there is no way to find the generating polynomial or LFSR without an immense amount of work and experience. In this case I would give up, but I am not ready to do that yet.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.