Indeed, modulo a prime, already $r^2$ are non-uniform - only half of the values are possible. While the computed polynomials indeed do not follow uniform distribution, the non-uniformity is bounded: each output value of the polynomial may have at most $L$ preimages (roots), where $L$ is its degree, equal to the number of blocks. Meaning that the probability may increase from $1/R$ to $L/R$, where $R$ is the number of all possible $r$ (which is $2^{106}$ in Poly1305). To get a non-negligible success probability, one has to attempt a forgery with a huge number of blocks.
Note that the output is masked by adding AES(nonce). This makes blind prediction of the MAC useless. The most powerful attack here is a differential forgery attempt: given a (message, nonce, tag) triple, generate another triple (message', nonce, tag'). The same nonce removes AES(nonce) from the consideration in the difference $(tag' - tag)$. We "only" have to predict poly(message') - poly(message) for any message and message' of our choice. Which is hard since a difference of non-equal polynomials is still a nonzero polynomial of the same or smaller degree, and the probability to guess the right output is small.
This reasoning works for any finite field.
edit: thanks to @poncho for noticing a dangerous confusion of xor and addition over GF(p)
edit: in https://cr.yp.to/antiforgery/pema-20071022.pdf , Bernstein first introduces a dot-product-based MAC, i.e.. $MAC(m) = m_1r_1 + m_2 r_2 + ... + s$. The $n=8$ shares are chosen only for an illustration, since this allows only to sign messages of 8 blocks. Again, this is only done for educational reasons and to show "pure" historical constructions. Later in the paper, he replaces $r_i$ with $r^i$: this allows to avoid fully-fledged pseudorandom generation and storage of many $r$'s. Similarly, fully random $s$ can be replaced by e.g. AES(nonce).