If we're talking about Katz and Lindell, it might be worth investing in a newer edition. My (third) edition section 7.2.1 on page 225 reads:
A better attack is possible by noting that individual bits of the output depend on only part of the sub-keys. Fix some given input/output pair $(x,y)$ as before. Now, the adversary will enumerate over all possible values for the first byte of $k_1$. It can XOR each such value with the first byte of $x$ to obtain a candidate value for the 1-byte input to the first $S$-box. Evaluating this $S$-box, the attacker learns a candidate value for the output of that $S$-box. Since the output to that $S$-box is the XORed with 8 bits of $k_2$ to yield 8 bits of $y$ (where the positions of those bits depend on the mixing permutation but are known to the attacker), this yields a candidate value for 8 bits of $k_2$.
(A quick Google for Katz and Lindell errata also yields this paragraph for p.210 of the second edition).
One can assume/enumerate 8 bits of $k_2$ to deduce 8 bits of $k_1$, but rather than assume the first 8 bits of $k_2$ one should assume 8 bits of $k_2$ all of which correspond to the outputs of a single $S$-box; this then allows one to deduce the 8 bits of $k_1$ corresponding to the input bits.