I believe the discrepancy is due to some counter-intuitive properties of the sponge.
The collision resistance property of sponges comes from how the permutation $\pi(S)$ randomizes arbitrary inputs, that may be controlled by an adversary, onto a small image space. The adversary can either freely adjust inputs until a collision within the capacity section is found, where then the subsequent call to $\pi$ is fully controllable, giving $d = \pi(S')_{[0...d-1]} = \pi(S)_{[0...d-1]},$ $S' \not = S, S' = R'||C', C' = C$; Or, the adversary finds the more intuitive collision on the output bits given any $S' \not = S$. The easier of the two tasks determines collision resistance.
The preimage resistance property on the other hand, is essentially boiled down to the one-wayness of the hash, e.g. how hard it is to find any $x'$ given $d = h(x)$. If no collision on $C$ is found, then the adversary doesn't have complete control over the evaluation of the permutation $\pi(R'||k)$ with $k = C$, and the resistance is measured by the difficulty of replicating a $|d|$ sized output. However, if a collision on $C$ is found, both $\pi$ and $\pi^{-1}$ can be completely under the control of an adversary, so intermediate states performed when processing a chosen input $x'$ can be made to match other discovered intermediate states of inputs $x^*$ to produce matching $h(x) = h(x') = h(x^*)$. Then, the resistance against inversion of $h(x)$ into viable matching $x^*$ is directly related to the collision resistance of the capacity. The easier of the two tasks here determines preimage resistance.