Can someone shed some light on this and confirm that in the worst case computing the Merkle root ab initio requires 2N operations?
Well, for the stateful hash-based signature algorithms we use in practice (XMSS and LMS), the time to generate a leaf public key (that is, the public key for a one-time signature) far outweighs the time taken to compute a Merkle tree node from its two child nodes. Hence, in that sense, the time taken to generate a root based on $N$ leaf nodes is the time taken to generate $N$ ots leaves, plus a little extra (where this "little extra" will likely be less than 1% of the time taken to generate the leaves [1]).
Now, if you want to look into the details of this 'little extra', that's not hard to analyze; we start with $N$ nodes, and each Merkle tree node reduces the total number of nodes by 1 (because it takes two child nodes values as input and generates one node value as output), hence there are a total of $N-1$ such mergers needed.
The one exception is if $N$ didn't happen to be a power of 2 - in that case, if all the leaf nodes start at the same level (which XMSS and LMS do), then there will be some levels where we have a node which has one valid child node input and one artificial node input (which stands for a branch of the logical Merkle tree where we didn't place any actual leaves); such a node doesn't reduce the total number of nodes at all. On the other hand, we can arrange things so that any level of the Merkle tree has at most one such 'nonreducing node'. Since there are $H$ levels, there are at most $H$ such nonreducing nodes, and so the maximum number of Merkle tree internal nodes that need to be evaluated is $N + H - 1$.
So, the total amount of work is $N$ one-time-signature root generation and $N + H - 1$ Merkle tree mergers. However, I wouldn't personally add those two to come up with $2N + H - 1$, because the hidden constant of proportionality for the first one is so much larger than the hidden constant for the second one. It would be like we have 1024 watermellons and 1023 grapes; we would be misleading to say we have 2047 fruits (even though that is technically correct).
[1]: Lets spell this out explicitly; if we have $W=16$ and $N=256$, then we have 67 Winternitz chains, each of length 15, and so that would take at least $67 \times 15 = 1005$ hash operations, not counting the time taken to generate the random values at the bottom of the chain, or the time taken to combine the top of the chains together. In contrast, a Merkle node can be evaluated (given the child values) with a single hash operation. Even if this single hash operation is a bit more expensive (because of the longer input), it is still overwhelmed by the Winternitz evaluations.