[zcash/zcash] Understand how to optimize a circuit implementation of Groth16 verification (#3425)

The remaining part of a pairing computation is the Miller loop. The basic operations in the body of the loop are «point doubling with line evaluation» and «point addition with line evaluation».

### Doubling with line evaluation

A doubling in E’/Fp2 requires 10C as computed [above](https://github.com/zcash/zcash/issues/3425#issuecomment-408818642). Note that this already computes x12. So the line evaluation (based on formula (2) of [Aranha et al](https://www.iacr.org/archive/eurocrypt2011/66320047/66320047.pdf)):

    *l* = (−2.y1.yP).v.w + (3.x12.xP).v2 + ξ.(3b’ — y12)

can be computed in an additional 2m~ + s~ = 8C, for a total of 18C. There are alternative ways of combining doubling and line evaluation, but I wasn’t able to find any further improvement in the constraint cost.

TODO: check whether (xP, yP) is sometimes constant (this depends on how the pairing is used).

### Addition with line evaluation

Aranha et al don’t give formulae for the line evaluation used in the addition step, but their work is based on section 5 of [Costello, Lange, and Naehrig](https://eprint.iacr.org/2009/615.pdf) (in turn based on section 4 of [Costello, Hısıl, Boyd, Nieto, and Wong](https://eprint.iacr.org/2009/243.pdf)) which has very similar formulae for the doubling case. Switching to affine coordinates:

    *l* = (y1 — yP).(xP — xQ) + (x1 — xP).(yQ — yP)

or something like that. (This formula is probably slightly wrong because it is the version that eliminates the twisting of the point P, which is an optimization from Costello, Lange, and Naehrig that is handled differently in Aranha et al; but I don’t think that affects the operation count.) So that is an additional 2m~, for a total of 14C.

### Miller loop

Aranha et al section 6 says:

> For the selected parameters and with the presented improvements, the Miller Loop in Algorithm 6 executes 64 point doublings with line evaluations, 6 point additions with line evaluations (4 inside Miller Loop and 2 more at the final steps), 1 negation in Fp2 to precompute yP, 1 p-power Frobenius, 1 p2-power Frobenius and 2 negations in E'(Fp2); and 1 conjugation, 1 multiplication, 66 sparse multiplications, 2 sparser multiplications and 63 squarings in Fp12.

For now we won’t bother to optimize the sparse[r] multiplications –partly because the paper doesn’t describe that– so we will count those as full multiplications requiring 45C each. (This adds up to 3060C so is an obvious area for improvement.)

The total cost of the rest of the pairing, i.e. everything but the final exponentiation, is 64*18C + 6*14C + (1+66+2)*45C + 63*30C = 6231C.