With the double-and-add method of curve multiplication, we set single bits in the exponent. For example, if we're at 1101000G
, we can double (to get 11010000G
) and then conditionally add $G$ (to set the lsbit, getting 11010001G
).
NAF extends this by noting that, with a bit of precomputation, we can insert several bits at once. For example, if we're at 11010000G
, and we have 1101G
precomputed, we can add those together, getting 11011101G
. In this example, we set 3 bits in the exponent with a single add - doing the same with double-and-add with require 3 point additions.
So, with 4-bit NAT, we start with the values $0G, 1G, 2G, ..., 15G$ precomputed. If we are using the same $G$ repeatedly (e.g. $G$ is the curve 'generator point'), we can do those once and put them in a table. Alternatively, we can compute those with 7 additions and 7 doublings.
Then, to compute 100101101011010000111G
, we start with the top 4 bits 1001
. We have 1001G
in our precomputed table, so we start there. Then, we keep on doubling until we reach 100100000G
; at this point, we have 4 bits (the 1101
pattern) in the target that is not expressed in our current value - we get the 1101G
value in our precomputed table and add it in, given 100101101G
. We keep on going (adding in values from our precomputed table when we get 4 bits in the target that is not expressed in our current value) until we hit the end - in that case, there's a remaining 111G
, so we add that in, and we're done.
If you note the patterns of doublings and additions (with a doubling without an addition being signified by a 0), that pattern is the 9 0 0 0 0 13 0 0 0 0 13 0 0 0 0 0 0 7
that's found in quote.
With double-and-add, we end up using an expected $n/2$ additions to multiply by an $n$-bit scalar; with 4-bit NAF, we use an expected $n/5$ additions (and the number of doublings are about the same - 4 bit NAT uses 3 fewer) - that is not that bad of a savings.
On the other hand, if the attacker is able to listen into when we are performing the doublings versus the additions, he can gain some information on the scalar we're multiplying by - that can be bad (hence this algorithm may not be appropriate without additional masking in some circumstances).
There are even more aggressive ways of implementing point multiplications with elliptic curves; however that should help with understanding this one.