The key point is that they are using a universal circuit there:
When we construct a SNARK for circuit SAT, we use a QSP for a universal circuit, which has size $O(|C| \log |C|)$ where $|C|$ is the maximum size of the circuits in the satisfiability problem.
A universal circuit is one which can compute any other circuit. This generalisation is where the extra logarithmic factor comes in. This is explained at the start of section 3:
To compare directly with Groth, we can handle $u$ being an adaptively-chosen circuit by constructing $R$ [(a relation)] from a
universal circuit. In this case, the size of the circuit computing $R$ may be larger
than $|u|$ by a logarithmic factor, which correspondingly increases the CRS size and prover computation to $\tilde{O}(|u|)$. If $u$ ... can be chosen non-adaptively, our scheme becomes more efficient.
The $\tilde{O}$ notation hides logarithmic factors. At the end they mention that knowing the specific circuit $u$ would be more efficient - this is what you have in mind. The reason for the adaptive soundness is because this more correctly models how the protocol would likely be used in real situations - often you would expect the prover to see the CRS before beginning their proof, so they could choose the statement they are (perhaps falsely) trying to prove based on the CRS (adaptively).
This paper explains an efficient universal circuit construction, and gives an explanation of how such circuits work in the preliminaries (for example, see pages 4-7). Intuitively, the logarithmic factor is introduced because these circuits tend to be built recursively, with each sub-problem being roughly half the size, so we have the usual log factor we get when dealing with binary trees. For better explanation I think the paper itself is the best thing to read.