For a random input. Each bit has probability $\frac{1}{2}$ to be zero.
Then because we suppose independence (ideal hash function implies this assumption). For each input the probability to have $4$ zeros in the leading bits is $\frac{1}{2^4}=\frac{1}{16}$.
Then after $k$ computations, because we consider each output is independent (still because it's an ideal hash function). The probability to wait exactly $i$ steps is a good output is $\frac{1}{16}\left(15/16\right)^{i-1}$. (because you have a wrong output for $(i-1)$ hashes, and then a goof one).
And the expectation of the time of computation is $\sum^{\infty}_{i=1} i\frac{1}{16}\left(15/16\right)^{i-1}=
\frac{1}{16}\sum^{\infty}_{j=1} \sum^{\infty}_{i=j}\left(15/16\right)^{i-1}= \frac{1}{16}\sum^{\infty}_{j=1} \sum^{\infty}_{i=0}\left(15/16\right)^{j+i-1}$
$$=\frac{1}{16}\sum^{\infty}_{j=1} \left(15/16\right)^{j-1}\sum^{\infty}_{i=0}\left(15/16\right)^{i} $$
$$= \frac{1}{16}\sum^{\infty}_{j=1} \left(15/16\right)^{j-1}\frac{1}{1/16}$$
$$=\sum^{\infty}_{j=0} \left(15/16\right)^{j}
=16$$.
Then in average you have to compute $16$ hashes to find a good one.
You can easily generalize this proof by replacing $16$ by $2^m$, and you will deduce that in average you need to compute $2^m$ hashes.
Notice that because we choose by advance the coordinates where we want to have the zeros, the total output of the hash function doesn't matter if we look the number of hashes we should compute (but it could do if we look the time of computation of $H$).