The design rationale for the S-box is given in section 4.3 of the paper. I'll try and expand on the reasoning.
The worst sort of S-box would be a linear function i.e. one where every bit of output is just the XOR of certain bits of input. As the rest of the PRESENT round function just involves moving bits around and the XORing of round key a linear S-box would lead to a cipher where each bit of output is an XOR of certain inputs bits and round key bits. Collecting a handful of matched inputs and outputs would then allow one to break the cipher very simply using linear algebra.
Block cipher cryptanalysis often focuses on properties of a cipher that are close to behaving like a linear function a detectable proportion of the time.
Linear cryptanalysis
It would still be bad if the XOR a certain collection of input bits and key bits were equal to the XOR of a certain collection of output bits significantly more than half of the time. The exact details of how to turn this into an attack are lengthy, but it would involve collecting a large number of matches input/output pairs, computing the relevant XORs and seeing whether most of the time they matched or did not match. This would give information about the round key bits. To avoid this situation cryptographers look for S-boxes where for any subset of input bits and any subset of output bits the XORs are equal only about half of the time (or as close as is possible). Criterion 3 of the paper says that for the PRESENT S-box the XORs will be equal at least 4 times out of 16 and at most 12 times out of 16. Things could be worse if the subset we picked in each case were a single bit in which case we could correlate a single bit of input with a single bit of output by tracking through only one S-box each round. This motivates criterion 4 which specifies that when the subsets consist of a single bit then they match either exactly 6 times out of 16 or exactly 10 times out of 16.
Differential cryptanalysis
Another way to express the linear property of a cipher is to say that for all inputs $a$ and $b$ we have $\mathrm{Enc}(a\oplus b)=\mathrm{Enc}(a)\oplus\mathrm{Enc}(b)$. Again, if similar properties show up in our cipher a significant proportion of the time then this can be exploited. Cryptographers therefore also try to design ciphers such that for any fixed difference of inputs $\Delta x$ and any fixed difference of outputs $\Delta y$ the equation $\mathrm{Enc}(x\oplus\Delta x)=\mathrm{Enc}(x)\oplus\Delta y$ only roughly one solution $x$ (roughly two solutions half of the time and no solutions half of the time). Again, as the S-box is the only non-linear part of the cipher this becomes a requirement that input and output differences in the S-box should correlate as little as possible. Criterion 1 then says that out of all possible input and output differences the maximum correlation is to find an input difference that produces the same output difference for 4 out of the 16 possible inputs (i.e. two pairs). Again the single bit property is more dangerous as the difference would propagate through the rounds, passing through a single S-box each round. Accordingly then criterion 2 is added which says that for any inputs that differ by only a single bit their outputs do not differ by a only single bit.
As there are only 16! possible 4-bit S-boxes and a great deal of symmetry between certain classes of S-box, it is possible to exhaustively test all possible S-boxes for the properties described above. This analysis has been done, allowing cryptographers to choose S-boxes of this size knowing that they are essentially optimal in terms of linear and differential properties. There are still several S-boxes that meet these requirements and the PRESENT team claim to have chosen one particularly well suited to hardware implementation. I take this to mean that the function represented by the S-box can be evaluated with a small number of gates or with a small circuit depth (I do not have details on this particular point).