No, breaking the collision property of SHA-256 does not require any close to $2^{128}$ space. We know how to exhibit collision in any $n$-bit hash $H$ with $\mathcal O(2^{n/2})$ hash evaluations and $\mathcal O(n)$ space.
The simplest suitable method is Floyd's cycle finding, which will exhibit with non-vanishing probability two distinct $n$-bit bitstrings $r$ and $s$ with $H(r)=H(s)$, in the orbit a given initial point $t$ when iterating $H$
- $m\gets\lceil\,2^{n/2+1}\,\rceil$ (increasing $+1$ reduces failures).
- $u\gets H(t)$ .
- $r\gets u$ ; $s\gets H(u)$ .
- while $r\ne s$ : (find cycle)
- if $m=0$ then stop in failure (long orbit, rare).
- $m\gets m-1$ .
- $r\gets H(r)$ ; $s\gets H(H(s))$ .
- if $t=s$ then stop in failure ($t$ in cycle, vanishingly rare).
- $s\gets H(s)$ .
- while $r\ne s$ : (offset $u$ by one cycle)
- while $t\ne u$: (locate collision)
- $r\gets t$ ; $s\gets u$ .
- $t\gets H(t)$ ; $u\gets H(u)$ .
- output $(r,s)$ and stop in success.
Try it online! for collision of a 24-bit hash (the first $k=3$ bytes of SHA-256). Please be kind enough to run this Python code on your machine if increasing $k$.
The method uses that the orbit of $t$, defined as the $u$ reached by iterating $u\gets H(u)$ starting from $u=t$, tends to cycle within $\mathcal \Theta(2^{n/2})$ steps. The algorithm detects a cycle is reached, find $u$ after as many steps from $t$ as the cycle's length, then where the cycle is entered, which yields a collision. It can be shown that for random function $H:\{0,1\}^*\mapsto\{0,1\}^n$ and except for very small $n$, the success probability of this algorithm from any starting point $t$ is at least $3/4$ (failures are for overly long orbit of $t$, or when $t$ belongs to a cycle).
In case of failure, it's often enough to restart from another random point. That typically works fine for common cryptographic hashes $H$, but even for these it can happen that most starting points lead to a cycle too large to be found. In a general case, we want to switch to using the algorithm with $H'$ defined by $H'(x)=H(F(x))$ for a suitably random efficiently computable and invertible injection $F$ chosen at start of the algorithm. That's so exhibiting a collision for $H'$ using the algorithm exhibits a collision for $H$, but $H'$ can have a different cycle structure. For most $n$-bit hash $H$ suitable for cryptographic use, $F$ can be XOR with a fixed $n$-bit bitstring, or adding a fixed prefix or/and suffix. This is not illustrated in the above pseudocode and linked Python code.
It's possible to distribute the work on many machines running in parallel, each with little memory, moderate communication, and moderate extra work. See Paul C. van Oorschot and Michael J. Wiener, Parallel Collision Search with Cryptanalytic Applications, in Journal of Cryptology, 1999.