To generalize this, pick a given $x$. Model the action of SHA-256 as picking a random output $y_o\in\{0,1\}^{256}$ uniformly and independently for each $y_i\in\{0,1\}^{256}$. The probability that $y_o=x$ is $p:=\frac{1}{2^{256}}$. This means we can compute the probability that $x$ has $k$ pre-images as a binomial distribution:
$$\text{Pr}[\vert H^{-1}(x)\vert=k] = \binom{2^{256}}{k}p^k(1-p)^{256}$$
With a size of $2^{256}$, we're better off approximating a binomial distribution by a Poisson distribution, with $\lambda = p\cdot 2^{256}=1$, giving
$$\text{Pr}[\vert H^{-1}(x)\vert=k] = \frac{e^{-1}}{k!}$$
(you can also generalize this to the case where we have inputs of more or less than 256 bits, or to the case when we know that $x$ has at least one pre-image).
The analysis gets tricky if we take two values $x_0$ and $x_1$ and ask for the probability that both have a certain size of pre-image, since any element in the pre-image of $x_0$ cannot be in the pre-image of $x_1$, and vice versa. But actually, if we're thinking about small sizes of pre-images, then the probabilities change so little that we can treat the pre-image sizes as independent.
And as @poncho says, SHA-256 could have some undiscovered structure that invalidates this model of it.