Edit: Let me try and explain further. It is because the algorithm looks for restricted solutions that it finds on average one of the $$\frac{(2^{d/3})^4}{2^d}=2^{d/3}$$ solutions present in the lists $L_i$ as chosen below. This is the price paid for having complexity $2^{d/3}$ instead of the $2^{d/2}$ complexity of Shamir Schroeppel.
Taking the case $k=4,$ which is when you are looking for a solution
$$x_0+x_1+x_2+x_3=0,\quad x_i \in L_i$$ Wagner randomly generates 4 lists $L_i~(1\leq i\leq 4)$ of size $2^{d/3}$ where $d$ is the bitlength.
By statistical arguments you'd have a single solution with constant probability bounded away from zero when the lists are size $2^{d/4}$ (consider the fact that there are $(2^{d/4})^4=2^d$ 4-sums that can be drawn from these lists and with constant probability the value $0$ will be hit). But the point is there is no efficient way of finding this single solution except by Shamir-Schroeppel method which has efficient memory but time complexity $2^{d/2}.$
What Wagner does is recursively generate the solutions, but the solutions have special structure. The first third of the bits of the candidates from $L_0,L_1$ are matched, similarly for $L_2,L_3$ etc.
Because the solutions are structured you need to generate more solutions than the minimum number required so that your algorithm finds a single solution with good probability.