ECDH is known as a Key Encapsulation Mechanism, which as you mention is similar to public key encryption, but not the same.
There are many reasons to prefer KEMs, I will quickly mention one.
First, note that a KEM is (formally) a tuple of three algorithms $(\mathsf{KGen}, \mathsf{Encaps}, \mathsf{DCaps})$, where
$\mathsf{KGen}$ takes as input some security parameter $1^\lambda$, and outputs a keypair $(sk, pk)$
$\mathsf{Encaps}$ takes as input a public key $pk$ (and perhaps some randomness, which is often only implicit), and returns a pair $(k, c)$ of a derived key $k$, and "ciphertext" $c$
$\mathsf{Decaps}$ takes as input a secret key $sk$ and "ciphertext" $c$, and outputs another derived key $k'$.
The KEM is correct if $k = k'$ in the end, e.g. the two derived keys agree.
The security notion of KEMs is similar to that of PKE, meaning there is a natural way to extend the traditional notions of IND-CPA/IND-CCA security.
Note that one can build a KEM using any PKE, by having $\mathsf{Encaps}_{pk}(r) = \mathsf{Enc}_{pk}(r)$, where $r$ is the randomness used by the KEM (this is the "encrypt uniformly random keys" idea you mentioned).
Maybe we should explicitly write $\mathsf{Enc}_{pk}(f(r))$ is some function of the randomness --- I will not bother being this explicit.
So why care about KEMs?
While there are other things you can mention, a major point is that there are certain properties that "natural" KEMs (such as ECDH) have that a KEM constructed from the "encrypt random keys" approach do not have.
This is to say that ECDH is not "just" a KEM, and can be used in applications where encrypting random keys does not work.
Perhaps the most obvious property to point to is "non-interactivity".
In particular, ECDH can be written as
- both parties exchanging a diffie hellman keypair $(g, g^{s_i})$, and then
- computing some simple function of these key-pairs.
If we try to write this with the syntax of a KEM, we might say that $\mathsf{KGen}(1^\lambda)$ produces one key-pair $(g, s_0, g^{s_0})$, and that $\mathsf{Encaps}_{g^{s_0}}(r) = (g, s_1, g^{s_1})$ produces another key-pair, where we model the "ciphertext" as $g^{s_1}$.
This has a very curious property though --- $g^{s_1}$, the "ciphertext" of the scheme, does not depend on the public key at all (besides via the group generator $g$, which can be standardized as a public parameter).
This is a quite surprising property, and one that the "encrypt a random key" scheme does not have.
It is known as being a Non-interactive Key Exchange (NIKE) scheme.
The property is both
useful in practice --- Signal's "double ratchet" uses this property in a key way, which makes it hard to "drop in" another KEM to use for signal, and
theoretically non-trivial --- generically constructing NIKE requires some fancy primitive such as FHE/Functional encryption. There are known results that show that it is probably not possible to build NIKE using lattices (and plausibly codes) with "small parameters".
In fact (excluding lattice-based schemes with large parameters), I am only aware of one post-quantum NIKE, namely CSIDH. This is to say a straightforward modification of Signal to be post-quantum either
- uses CSIDH,
- uses a less-efficient variant of a NIST PQC scheme (say a lattice-based scheme with small parameters), or
- modifies the Signal protocol itself in some way, usually with some efficiency hit.
While there are more nuanced things you can say to compare PKE and KEMs, theoretically there is a very large benefit to ECDH --- it is an efficient NIKE, which are not very common at all.