Skip to content

Conversation

@tarcieri
Copy link
Member

Replaces branching with an arithmetic-based approach. This unfortunately seems to double the time multiplication takes (and with it, modpow).

Benchmarks

Montgomery arithmetic/multiplication, BoxedUint*BoxedUint
                        time:   [10.027 µs 10.048 µs 10.068 µs]
                        change: [+101.70% +102.44% +103.19%] (p = 0.00 < 0.05)
                        Performance has regressed.
Montgomery arithmetic/modpow, BoxedUint^BoxedUint
                        time:   [48.200 ms 48.273 ms 48.352 ms]
                        change: [+97.268% +97.613% +97.972%] (p = 0.00 < 0.05)
                        Performance has regressed.

Replaces branching with an arithmetic-based approach. This unfortunately
seems to double the time multiplication takes (and with it, modpow).

Montgomery arithmetic/multiplication, BoxedUint*BoxedUint
                        time:   [10.027 µs 10.048 µs 10.068 µs]
                        change: [+101.70% +102.44% +103.19%] (p = 0.00 < 0.05)
                        Performance has regressed.

Montgomery arithmetic/modpow, BoxedUint^BoxedUint
                        time:   [48.200 ms 48.273 ms 48.352 ms]
                        change: [+97.268% +97.613% +97.972%] (p = 0.00 < 0.05)
                        Performance has regressed.
Comment on lines +269 to +273
/// Compare limbs in constant time, returning `Limb::ONE` if the left size is less than the right.
#[inline(always)]
fn limb_ct_lt(a1: Limb, b1: Limb, a2: Limb, b2: Limb) -> Limb {
(a1.sbb(b1, Limb::ZERO).1 | a2.sbb(b2, Limb::ZERO).1) & Limb::ONE
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was faster than using the ct_lt approach previously used on L214, but still slower than using branch instructions

@fjarri
Copy link
Contributor

fjarri commented Dec 14, 2023

This could benefit from #418

@fjarri
Copy link
Contributor

fjarri commented Dec 16, 2023

Doesn't seem like it benefits.

But judging by Godbolt, using overflowing_add() there doesn't lead to branches. Directly casting the booleans to Word produces the same code as the usage of overflowing_add().

@tarcieri
Copy link
Member Author

Closing this for now

@tarcieri tarcieri closed this Dec 17, 2023
@tarcieri tarcieri deleted the boxed-residue/constant-time-mul branch December 17, 2023 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants