Crandall primes by mratsim · Pull Request #445 · mratsim/constantine

mratsim · 2024-07-27T15:36:50Z

This closes #11 for primes of form 2ᵐ-c (Crandall primes / pseudo-Mersenne primes), such as the one used for Curve25519 and secp256kq (Ethereum/ Bitcoin).

Bench Fp vs Constantine master

Current

Analysis

Fp[Edwards25519] mul 1.27x improvement
Fp[Edwards25519] square 1.43x improvement
Fp[Secp256k1] mul 1.94x improvement
Fp[Secp256k1] square 1.28x improvement

Bench EC vs Constantine master

Current

Analysis

EC add projective constant-time improved by 1.36x
EC add jacobian constant-time improved by 1.34x
EC add projective vartime improved by 1.24x
EC add jacobian vartime improved by 1.37x
EC dbl projective constant-time improved by 1.31x
EC dbl jacobian constant-time improved by 1.06x

Bench vs bitcoin/secp256k1

field_sqr 12.4ns vs 8ns -> 1.55x
field_mul 15.8ns vs 10ns -> 1.58x
field_inv_ct 1410ns vs 1203ns -> 1.17x
field_inv_vt 820ns vs 848ns -> 0.97x
EC add jacobian var 247ns vs 97ns -> 2.55x
EC dbl jacobian var 97.8ns vs 145 -> 0.67x
EC mixed add ct 189ns vs 225ns -> 0.84x
EC mixed add var 173ns vs 98ns -> 1.77x

EC scalar-mul ct 28100ns vs 40196 ns -> 0.70x

Analysis

The fact that field operations are 1.5x faster BUT the elliptic curve operations are sometimes slower is suspicious. We probably need to check the EC formulae

TODO

fix windows
bound checks for lazy reduce and lazy reduced field exponentiation for 256-bit as eprint/iacr 2018/985
indicates in Theorem 4 that their partial reduction may grow by 1 bit if 256-bit.
optimize EC impl to avoid if/else check for ADX and limit input/output movement
optimized mixed add

mratsim · 2024-07-27T16:20:03Z

Bench vs RustCrypto/elliptic-curves

https://github.com/RustCrypto/elliptic-curves/ is the current record holder of https://programming-language-benchmarks.vercel.app/problem/secp256k1

We modify it to bench some of the internals

Field implementation

cargo bench --features expose-field -- field

with an extra

fn bench_field_element_10adds<'a, M: Measurement>(group: &mut BenchmarkGroup<'a, M>) {
    let x = test_field_element_x();
    let y = test_field_element_y();
    group.bench_function("10 adds", |b| b.iter(
        || {
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y);
            &black_box(x) + &black_box(y)
        }
    ));
}

10 adds: 25ns vs 12ns - 2.08x
mul (partially normalized in k256): 17.825ns vs 10ns - 1.78x
sqr (partially normalized in k256): 13.846ns vs 8ns - 1.73x

EC implementation (projective with Renes2015 formulae)

use criterion::{
    black_box, criterion_group, criterion_main, measurement::Measurement, BenchmarkGroup, Criterion,
};
use k256::ProjectivePoint;
use elliptic_curve::{
    rand_core::SeedableRng,
    group::Group,
};
use rand_xorshift::XorShiftRng;

fn bench_ec_add<'a, M: Measurement>(group: &mut BenchmarkGroup<'a, M>) {
    let mut rng = XorShiftRng::seed_from_u64(1234u64);
    let p = ProjectivePoint::random(&mut rng);
    let q = ProjectivePoint::random(&mut rng);
    group.bench_function("EC Add", |b| {
        b.iter(|| &black_box(p) + &black_box(q))
    });
}

fn bench_ec_dbl<'a, M: Measurement>(group: &mut BenchmarkGroup<'a, M>) {
    let mut rng = XorShiftRng::seed_from_u64(1234u64);
    let p = ProjectivePoint::random(&mut rng);
    group.bench_function("EC Dbl", |b| {
        b.iter(|| black_box(p).double())
    });
}

fn bench_ec(c: &mut Criterion) {
    let mut group = c.benchmark_group("EC operations");
    bench_ec_add(&mut group);
    bench_ec_dbl(&mut group);
    group.finish();
}

criterion_group!(benches, bench_ec);
criterion_main!(benches);

EC add proj ct: 195.83ns vs 232ns - 0.84x
EC dbl proj ct: 130.83ns vs 153ns - 0.86x

Analysis

The fact that field operations are 1.7x to 2x faster BUT the elliptic curve operations are 0.85x slower is extremely suspicious. Especially when we implement the same formulae from Renes2015 paper.

There might be useless copies or parameter passing overhead similar to #21 and #146

… Prime fast reduction - closes #11

…t, renaming of lazy reduction both in Montgomery and Crandall to lazyReduction

…k1, failing edwards25519

…duce temporarily

mratsim added the performance 🏁 label Jul 27, 2024

This was referenced Jul 27, 2024

Low-level: discrepancy between field arithmetic performance and elliptic curve performance #446

Open

Windows: Secp256k1 tests assembly test frozen #448

Closed

mratsim mentioned this pull request Nov 27, 2024

Torus-acceleration for multiexponentiation on GT #485

Merged

mratsim added 16 commits December 3, 2024 11:04

feat(special primes accel): Support Crandall primes / Pseudo-Mersenne…

34bc1c0

… Prime fast reduction - closes #11

feat(special primes accel): refactoring: p-1 support ompiles on 64-bi…

d004d8d

…t, renaming of lazy reduction both in Montgomery and Crandall to lazyReduction

feat(special primes accel): support 32-bit

3b1eb30

chore: lazyReduction->lazyReduce

bcb35f5

fix: fp mulsquare test

d1dacc4

feat: Crandall exponentiation

0c30b28

feat: initial commit assembly for Crandall reduction, passing secp256…

195ace0

…k1, failing edwards25519

feat(asm-crandall): actually use the assembly

b2e1fe3

feat(asm-crandall): fix sqrt test and short immediate

20f3c0f

feat(crandall reduction): x86-adx reduction

cb93c2d

feat(crandall reduction): add final reduce, deactivate adx partial re…

d896514

…duce temporarily

feat(crandall reduction): fix adx partial reduce

d5ceb2d

feat(crandall reduction): prevent asm for mul on 32-bit

758d9d6

feat(bench): check overhead of field calls

b48a69e

feat(crandall reduction): prevent asm for mul on 32-bit reloaded

cba2139

crandall-primes: reactivate tests for secp256k1

4488f94

mratsim force-pushed the crandall-primes branch from 57911e8 to 4488f94 Compare December 3, 2024 10:18

mratsim linked an issue Dec 3, 2024 that may be closed by this pull request

Windows: Secp256k1 tests assembly test frozen #448

Closed

mratsim merged commit 585f803 into master Dec 3, 2024

mratsim deleted the crandall-primes branch December 3, 2024 12:22

mratsim mentioned this pull request Dec 12, 2024

Add ECDSA over secp256k1 signatures and verification #490

Merged

13 tasks

This was referenced Aug 7, 2025

[Regression+bug] Fix broken Nvidia test cases on LLVM #557

Closed

Nvidia: deactivate Edwards25519 on Nvidia backend for now #559

Merged

[Nvidia] Implement pseudo-mersenne prime (Crandall Prime) acceleration #561

Open

mratsim mentioned this pull request Sep 1, 2025

Expand field tests with BabyBear, KoalaBear and Goldilocks #573

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Crandall primes#445

Crandall primes#445
mratsim merged 16 commits intomasterfrom
crandall-primes

mratsim commented Jul 27, 2024 •

edited

Loading

Uh oh!

mratsim commented Jul 27, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mratsim commented Jul 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bench Fp vs Constantine master

Previous

Current

Analysis

Bench EC vs Constantine master

Previous

Current

Analysis

Bench vs bitcoin/secp256k1

Analysis

TODO

Uh oh!

mratsim commented Jul 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bench vs RustCrypto/elliptic-curves

Field implementation

EC implementation (projective with Renes2015 formulae)

Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mratsim commented Jul 27, 2024 •

edited

Loading

mratsim commented Jul 27, 2024 •

edited

Loading