What is the principle behind threshold implementation of block ciphers
The idea is that we take every secret state of the cipher (be it the key itself or an internal state value) and convert that into a "threshold" representation, where the logical value of the secret state $s$ is actually implemented by $t$ different physical values $s_0, s_1, \dots, s_{t-1}$, in such as way that knowledge of $t-1$ of the physical values gives no information about the logical value $s$.
The most common way to implement this is with xor-sharing; if $t=2$, then each logical state $s$ is represented by $s = s_0 \oplus s_1$; if $s_0, s_1$ can both 1 with probability $0.5$, then learning just one of them gives no information about $s$.
The obvious question is "how do you do operations with these threshold representations?". Linear/affine operations are easy: to negate, you just flip one of the physical components. To xor, you xor the two values component-wise: $(s_0, s_1) \oplus (t_0, t_1) = (s_0 \oplus t_0, s_1 \oplus t_1$)
Nonlinear operations are a tad trickier; to perform an AND operation, what we can do is pick a fresh random value $r$, and compute $(s_0, s_1) \land (t_0, t_1) = (s_0 \land t_0 \oplus (s_1 \land t_1 \oplus r), s_0 \land t_1 \oplus (s_1 \land t_0 \oplus r))$. This works because we have DeMorgan's theorem: $(s_0 \oplus s_1) \land (t_0 \oplus t_1) = (s_0 \land t_0) \oplus (s_0 \land t_1) \land (s_1 \land t_0) \oplus (s_1 \land t_1)$ (which generalizes to arbitrary threshold values. And, we stir in fresh randomness $r$ because otherwise the physical values would no longer be equidistributed.
With NOT, XOR and AND operations, any block cipher can be constructed.
how is this protecting against side channel attacks?
The idea is that even if the attacker can get some information about the state of $t-1$ secret values simultaneously from a side-channel attack, that doesn't tell him anything about what's going on with the cipher itself.
I was asked to put in a reference; the earliest general reference I know is Toward Sound Approaches to Counteract Power-Analysis Attacks by Chari et al; in particular, see section 3.3 "Encoding"