This is not an answer.
In the hope of assisting anybody thinking about this and in light of @TMM's comment, here's a little bit more flesh around the statement "Intuitively, one feels that if $\beta_2$ is small, then the contributions of the different vectors to the score will not be independent".
Consider the case $\beta_2=1$. In this case all of our $(\mathbf x^T,\mathbf y^T)$ vectors will be multiples of a single generating vector, say $\alpha(\mathbf x_0^T,\mathbf y_0^T)$ with $\alpha$ perhaps some discrete Gaussian depending on the number of vectors. Now there are $q^{n-1}$ vectors $\mathbf v$ such that $\mathbf x_0^T\cdot\mathbf v=0$. Consider any vector of the form $\mathbf v+\mathbf e$ where $\mathbf e$ is drawn from the same distribution as the LWE sample. We expect perhaps $O(\sigma^nq^{n-1})$ such vectors (vectors with two such representations should be rare) and for large $n$ we might expect these to cover most of the space. The score of such vectors is given by $\alpha\mathbf x_0^T\mathbf e$ and the score of causal solutions is given by $\alpha(\mathbf x_0^T\mathbf e+\mathbf y_0^T\mathbf s)$. The space of causal vectors would be indistinguishable.
More generally for $\beta_2=k$ fixed, there would be a basis of $k$ $(\mathbf x_i^T,\mathbf y_i^T)$ vectors with our test set formed of linear combinations of basis vectors where the coefficients are Gaussian(?). Again there will be a set of $q^{n-k}$ vectors $\mathbf v$ perpendicular to all of the $\mathbf x_i^T$ and a perhaps a neighbourhood of $O(\sigma^nq^{n-k})$ of low scoring non-causal vectors.
This seems to suggest that $\beta_2$ should be at least $n\log \sigma/(\log q)$, but there could be other structural but non-causal sets as well as non-strucutural low scores. Likewise, the arguments for lack of overlap in the neighbourhood and between the causal sets are heuristic at best.