The statement "we expect to produce $2^{64.3}$ numbers before we start to repeats" is only true if we believe that 128-bit Middle Square behaves like a random mapping on $\{0,1\}^{128}$. However, we can show that it has properties that are highly unlikely for a random mapping.
Recall that 128-bit Middle Square maintains a 128-bit state $S_t$. Updates are effectively made by squaring $S_t$, and taking bits 64-191 as the new $S_{t+1}$ i.e.
$$S_{t+1}=(S_t^2>>64)\%2^{128}.$$
The state $S_t=0$ represents a fixed point. Although random mappings have fixed points with probability roughly $(1-1/e)$, this is one is unusual as it has a large number of preimages. Any number $S<2^{32}$ will map to 0 as will any number $S$ divisible by $2^{96}$. These preimages alone (there may be others) total $2^{33}$ when for a large random map we expect the number of preimages to be distributed Poisson(1). Moreover if we consider predecessors, any number $S<2^{64}$ will map to a smaller number and number less than $2^{63}$ will reach 0 in fewer than 6 steps. Likewise for numbers divisible by $2^{65}$. This gives at least $2^{64}$ predecessor states for 0 when a random map would expect $2^{64}\sqrt{\pi/8}$ (with coalescence time about the same). The number of predecessor states increases still more when we consider the possible predecessors of our guaranteed $2^{64}$ predecessor states, if these each have $2^{64}\sqrt{\pi/8}$ predecessors we might see a positive proportion of our space degenerating to the 0 state.
There is also a preserved subspace of number exactly divisible by $2^{64}$ (this space is of size $2^{63}$) which we might expect to exhibit random mapping statistics for the smaller space (e.g. a cycle length of $2^{31.5}\sqrt{\pi/8}$). We then consider predecessors for this subspace and produce expectations significantly different from a full random mapping.
All-in-all these structures are very atypical of a random mapping and we should conclude that the random mapping is not a good model in this case.