From the theoretical standpoint of making ECDSA signature un-forgeable (including Existentially Un-Forgeable under Chosen Message Attack), it's not necessary to check that the public key is on the curve†: it's an assumption in the EUF-CMA experiment that the public key has been generated per the genuine key generation algorithm, which only outputs points on the curve.
From a practical standpoint, the interest of such check is limited by the fact that an adversary able to alter or set the public key as they desire can put another public key with a matching private key they know, which the check won't catch, and which then allows to circumvent the purpose of signing.
Depending on the format of the public key, the check would catch a random alteration with near certainty (for key in uncompressed format, where the verification checks the curve's equation is matches by $x$ and $y$ coordinates), or moderate probability (for key in compressed format, where it's checked that an $y$ exists matching the $x$ in the public key, which has probability near 50% for many common curves).
The reasons I see to make such check (which is common) are
- It is an assumption on the inputs of signature verification that the public key is a valid curve point, and the check can catch a mistake or malfunction affecting that input before it causes undefined behavior in the signature verification code; and that check is relatively cheap (for uncompressed keys) or not too expensive (for compressed keys) compared to the whole signature verification process.
- Making the check makes it easy for the verifier‡ to repel an argument on the tune of "but you did not perform the point-on-curve check suggested/mandated in [some reference]", or "but you verify that signature against a public key that's invalid in the first place!" (however that's not effective against the more general argument "but you verify my/their signature against a public key that's not mine/theirs!").
- To better resist fault attacks against implementations. It's conceivable that an altered public key as in a verification device submitted to deliberate fault attack would cause the computation of $R$ at step 5 in ECDSA verification to output a value independent of the altered public key (or one of few such values), thus making $R$ essentially independent of the component $r$ of the signature or otherwise predictable, which would make it easy to craft a signature that passes verification. Details would depend heavily on how this computation is made, and there are many ways.
- UPDATE: As rightly pointed in comment, substituting a legitimate public key with one fake that makes all messages/signature pairs verify could be of use to some attackers. Mechanisms by which an invalid key could lead to such a universal "pass" at verification may depend on the curve, on the method used for point arithmetic, and on other details in the verification code. Although I don't see how that could reasonably happen for ECDSA, a check that the point is on the curve is sure to prevent that for standard curves in a prime field (which cofactor $h=1$), and the full check does for all curves.
That check is required for the verifying entity by the de-facto ECDSA standard, but can be indirect, e.g. because it's assumed to have been made by a certification authority, and needs only to be performed once. The rationale given for the check is:
either to prevent malicious insertion of an invalid public key to
enable attacks like small subgroup attacks, or to detect inadvertent coding or transmission errors.
Small subgroup attacks are primarily a concern for ECDH. In ECDSA, they are no worse than insertion of a valid public key of the attacker's choice, which again seems about as easy, and less detectable.
† ECDSA actually assumes that the public key is a (non-neutral) member of the group generated by the base point. For standard curves in a prime field most used by ECDSA, "generated by the base point" is equivalent to "on the curve", since cofactor $h=1$. I discuss mostly the "on the curve" criteria, since that's what the question asks. The whole discussion largely applies to both criteria, except for the cost of testing "generated by the base point" on curves with cofactor $h>1$.
‡ Including a certification authority verifying a Certification Signing Request