The central insight helpful for understanding the proofs is that all the different ways to describe the degrees of freedom of the polynomials are in a one-to-one correspondence. A polynomial of degree $k-1$ has $k$ degrees of freedom which can be described by $k$ coefficients of the polynomial, or (for $k$ given $x$-coordinates) by $k$ heights it has to pass through. The mappings from the coefficients to the polynomials of degree $k-1$ and to the $y$-coordinates are all bijective. (They are even isomorphisms on $k$-dimensional vector spaces.)
Let's formalize that: For the moment we leave out the secret as a special parameter and treat it just like any other coefficient, i.e. $a_0:=S$. We fix $k$ (pairwise distinct) $x$-coordinates. Without loss of generality they shall be denoted by $x_1,\dots,x_k$. We define the function $f$ that maps from the coefficients $a_0,\dots,a_{k-1}$ of the polynomial $q$ to the $k$ heights ($y$-values) of the polynomial at the $x$-coordinates. The coefficients are all elements of $\mathbb{F}_p$ so the domain of $f$ is $(\mathbb{F}_p)^{k}$. The same is true for all the $k$ height ($y$-components) of the fixed $x$-coordinates, so the codomain is also $(\mathbb{F}_p)^{k}$. Calculating a single $y_i$ is as simple as plugging it into the polynomial:
$y_i:=q(x_i)=a_0 + a_1 x_i^1 + \dots + a_{k-1}x_i^{k-1} \qquad \text{for } 1\leq i\leq k$
and $f$ is defined by
$f: (\mathbb{F}_p)^{k} \rightarrow (\mathbb{F}_p)^{k}$
$(a_0,\dots,a_{k-1})\mapsto(y_1,\dots,y_{k})$.
To show the bijectivity of $f$ it is sufficient to show its injectivity, as the domain and codomain have the same finite size. The high level view is that a $k-1$ degree polynomial is uniquely described by its $k$-coefficients and that it can also be uniquely described by $k$ points it passes through. So if two lists of coefficients describe two polynomials that pass through the same points, the two polynomials must be the same and thus the list of coefficients must be the same (and hence $f$ is injective). As a more elementary view on the question take two lists of coefficients $(a_0,\dots,a_{n-1})$ and $(b_0,\dots,b_{n-1})$ that map to the same heights $(y_1,\dots,y_{k})$. Subtracting the two polynomials from each other leads to the polynomial $r(x):=(a_0-b_0) + (a_1-b_1) x^1 + \dots + (a_{k-1}-b_{k-1})x^{k-1}$ which has zeros on all of the $k$ given $x$-coordinates: $r(x_i)=0 \text{ for } 1\leq i\leq k$. Thus it must be the zero polynomial and $a_j-b_j=0$ for all $0\leq j < k$, which means the two lists of coefficients are the same. It follows that $f$ is injective and thus bijective.
(Furthermore, $f$ is linear, and thus an isomorphism.)
Looking through the lens of secret sharing: You can now fix degrees of freedom on one side and reduce the respective degree of freedom on the other side. Specifically, if you fix $k-1$ degrees of freedom, you still have 1 degree left. Furthermore, the way you fixed the first $k-1$ degrees of freedom can't interact with the remaining degree of freedom. (For me this is more intuitive, if I understand the degrees of freedom as dimensions in the vector spaces. If I fix $k-1$ dimensions in the vector space of heights ($y$-coordinates to the fixed $x$-coordinates) I still have 1 dimension left and the choices of these $k-1$ heights have no bearing on my choice of the last height.) As Shamir argues, the last degree of freedom (the last dimension) is still able to completely determine the secret and vice versa. Now, if the last degree of freedom, the secret $S$, would be drawn randomly like all others, it would be easier to argue, that the uniformity of the distribution of $S$ carries over into a uniformity of the height at the last $x$-coordinate (the value of the $y$-coordinate, i.e. the last share) and vice versa. As $S$ is not (uniformly) random in practical applications, I prefer another view on the matter to show the security of the scheme:
My preferred security proof. As a first step, let's fix $S$, as that is happening in practical applications as well. We take a look at a fixed but arbitrary set of $k-1$ secret bearers, that means we fix $k-1$ $x$-coordinates. Without loss of generality we denote them with $x_1, \dots, x_{k-1}$. What I want to show is that the heights ($y$-coordinates) are independent of the value of $S$. That means that the uniform distribution of the $k-1$ randomly chosen coefficients carries over into the resulting $y$-coordinates. That has to be the case, if there is a one-to-one relationship between coefficients and heights. To show the one-to-one relationship let's adapt the above $f$ into the following function $g$, where the constant coefficient $a_0$ is fixed as $S$: Choose $y_i$ for $1\leq i\leq k-1$ as follows (note that we only have $k-1$ heights now):
$y_i:=q(x_i)=S + a_1 x_i^1 + \dots + a_{k-1}x_i^{k-1} \qquad \text{for } 1\leq i\leq k-1$
and $g$ is defined by
$g: (\mathbb{F}_p)^{k-1} \rightarrow (\mathbb{F}_p)^{k-1}$
$(a_1,\dots,a_{k-1})\mapsto(y_1,\dots,y_{k-1})$
With an analogous argument as we used for $f$ we can show that $g$ is injective, and thus bijective: Assume we have two lists of coefficients that map to the same $k-1$ heights ($y$-coordinates for the fixed $x$-coordinates). The corresponding polynomials for the two lists of coefficients thus go through the same $k-1$ points already. Additionally both go through $(0,S)$ by construction. Thus, as they are polynomials of degree $k-1$ and share $k$ points they both go through, the two polynomials must be the same. Thus $g$ is injective and thus bijective as domain and codomain have the same finite size.
As a result we know that the $k-1$ secret shares cannot reveal any information about $S$ as they are completely determined by the (uniform) random choice of the coefficients $a_1, \dots, a_{k-1}$ and can thus themselves be interpreted as being chosen uniformly and randomly in the set of possible heights.