Score:1

Derive probability estimation for 'learning from parity with error'

mq flag

In Regev's Paper "On Lattices, Learning with Errors, Random Linear Codes, and Cryptography" he considers in the introduction of the paper the "learning from parity with error". Where we have an unknown $s \in \mathbb{Z}_2^n$ our goal is to find this $s$, given a list of equations with errors e.g. $\langle s,a_i \rangle \approx b_i (\text{ mod } 2)$ etc. The $a_i$'s are chosen independently from the uniform distribution on $\mathbb{Z}_2^n$, $\langle s, a_i\rangle = \sum_j s_j (a_i)_j$ is the inner product modulo 2 of $s$ and $a$, each equation is correct with probability $1-\epsilon$. With no error we would be able to solve a system of equations using Gaussian elimination, this is understandable.

Regev goes a bit further and considers the Gaussian elimination process and assuming that we are interested in recovering only the first bit of $s$. He says using Gaussian elimination, we can find a set $S$ of $O(n)$ equations such that $\sum_S a_i$ is $(1,0,...,0)$. Summing the corresponding values $b_i$ gives us a guess for the first bit of $s$.

The part that I don't understand and this is my question: He says with a standard calculation one can show that this guess is correct with probability $\frac{1}{2} + 2^{-\Theta(n)}$. How exactly does one arrive this estimate, what is this standard calculation? I don't quite understand it, so I wanted to ask this question here.

I hope that my question is understandable. I thank you for helpful comments/answers.

Score:2
ru flag

For a given linear equation $\langle s,a_i\rangle$ there is associated claimed answer $b_i$. We call this value correct if $\langle s,a_i\rangle=b_i$ and incorrect if $\langle s,a_i\rangle=b_i\oplus 1$. Note that we can form a new equation from two or more equations by mod 2 addition e.g. $\langle s,a_i\oplus a_j\rangle$ and associate to this claimed answer $b_i\oplus b_j$. Our claim will be correct if either both $b_i$ are correct or both are incorrect. In general when combining equations, the new equation's claimed answer will be correct if an even number of the combined $b_i$ are incorrect.

Suppose that $S$ consists of $k$ equations, we sum $k$ values of $b_i$ modulo 2 and if an even number of them represent the incorrect value, then the sum of $b_i$ is equal to the sum of correct values. By the binomial theorem the chance that exactly $c$ of the $b_i$ values are incorrect is $\binom{k}c(1-\epsilon)^{k-c}\epsilon^{c}$. Thus the total probability of a correct guess is $$\sum_{d=0}^{[k/2]}\binom{k}{2d}(1-\epsilon)^{k-2d}\epsilon^{2d}$$ comparing terms in the binomial expansions of $((1-\epsilon)+\epsilon)^k$ and $((1-\epsilon)-\epsilon)^k$ we see that the above expression is $$\frac12\left(\left((1-\epsilon)+\epsilon\right)^k+\left((1-\epsilon)-\epsilon\right)^k\right)=\frac12\left(1+(1-2\epsilon)^k\right).$$ Now for large $k$ and fixed $\epsilon$ we have $(1-2\epsilon)^k\approx \exp(-2k\epsilon)=2^{-O(n)}$ as required.

P_Gate avatar
mq flag
Hi Daniel, thanks for your reply. I understood everything from the binomial theorem onwards, but could you maybe elaborate a bit more on the part before that? How do you mean that exactly with the "correct value"? Thanks again!
Daniel S avatar
ru flag
I've added some words at the start and tidied things up a bit.
P_Gate avatar
mq flag
Hi Daniel, thanks for this great addition! It helps a lot! When calculating the probability, can it be that you have correctly and incorrectly mixed up something? The sum represents the incorrect probability, right?
Daniel S avatar
ru flag
I'm not immune to parity errors, but I think that the sum is the sum over situations where there are an even number ($2d$) of incorrect answers being combined so that the in each situation the combined expression is correct. It's certainly worth checking though.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.