What am I missing in my reasoning?
Two things:
There are other strategies other than sorting to find collisions; one obvious one is to build a huge hash table. Now, in practice, sorting would probably go faster (because a hash table that huge would be impractical), but theoretically, a hash table would allow the $O(1)$ accesses that this complexity analysis implicitly assumes.
$O(n \log n)$ is not actually the optimal sorting time. It is if the only operations you can perform on the data are comparisons and data movement; however we're not actually constrained that way. One obvious approach would be to do a radix sorting method, which can have considerably better time complexity.
Now, to be honest, we typically ignore the time taken by these noncryptographical operations (such as searching and sorting), unless they take a really large fraction of the total time. Quite frankly there can be a lot of details when you go down to that level (such as the complexity of dealing with $O(2^{56})$ data), and instead we just count DES evaluations.
We're not actually trying to break 2DES; instead, we're showing that breaking 2DES could actually be achievable in practical time. If we were actually trying to break it, we might end up using a lambda search instead, which would have a constant factor increase in complexity and is not guaranteed, would be easier to implement (as the lambda search would use much less memory, and still being easy to parallelize).