Make scalar/field choice depend on C-detected __int128 availability#793
Make scalar/field choice depend on C-detected __int128 availability#793real-or-random merged 2 commits intobitcoin-core:masterfrom
Conversation
fe3e7eb to
89acf9e
Compare
|
Concept ACK I like the naming here which demonstrates that this is only about multiplications. That seems useful as a first step towards the idea that has been mentioned in #711 (comment) and in the following comment. |
|
Indeed, this could be extended with a --with-wide-multiplication=mul128 for MSVC for example (well, MSVC platforms likely wouldn't use the configure script at all, but the equivalent macro config). |
|
Maybe we could get rid of autoconf here too by checking |
|
@real-or-random That would certainly simplify things further. One reason to not do this may be to keep the ability to explicitly select uint64_t based multiplication in a test. For example, I think we expect that all x86_64 Linux/gcc configurations would end up using the __int128 based multiplication, and without configurability, every such instance on Travis would do so. However, perhaps there are reasonable x86_64 Linux/gcc (or very similar) configurations out there that we don't know about, or can't test for, without __int128. If so, it may be better to keep it configurable, and have at least one explicit test with uint64_t based multiply. |
|
We could still disable in the tests using macros, e.g., by simply adding |
|
I don't think it should be exposed to the user. It should be possible to override w/ ifdefs for testing/benchmarking by developers (see for example the safegcd PR where testing 32-bit on x86_64 is both common and super useful), or for building for weirdo platforms where the type exists but doesn't work. This kind of thing can be documented in a development focused readme. Exposing it to joe-user just begs him to set it in a dumb way and hurt his performance. :) Another option if there is some reservation around define macro sniffing is to have it in autotools but hide the setting or explicitly note that it's for testing. Also ./configure setting can trigger overrides of macro-sniffed stuff, so the question of detection via autotools vs macros is largely orthogonal. I think that in general if there is a simple, portable, and reliable macro test it should be preferred-- autotools checking is slow, more likely to run into host vs target issues, and depends on using autotools, which not all targets will do. Especially in this project because the whole thing is one compilation unit and there are very few things that need to be detected (endianness, type sizes, clz instructions, not much else...) , so you don't end up with the issue of a huge detection header that is redundantly run for every object. |
|
I think I prefer a hidden configure flag, say "--with-test-override-widemul=X" or so. It feels a bit ugly and strangely roundabout to go undefine system defines to guide the autodetection you wrote yourself into the right thing for testing. I'd imagine it works by having USE_FORCE_WIDEMUL_X macros that are usually all undefined, but can be set by the configure script. There is no autodetection or verification in the script; it's just passed through. |
|
Right, I wasn't thinking undefining system macros would be good. -DUSE_FORCE_FOO makes perfect sense. (and could applied to inject testing CLZs and stuff like that if you later feel it would be useful to do so) |
|
Concept ACK. FWIW there are quite a lot of users who just use 32bit because they don't use autotools and for some reason decided it was easier to just always use 32bit code, which is a shame. and this could help them (if they'll update their libsecp...) examples: |
The undef thing was just a lazy example. |
|
It would be really good if configure still showed the configuration, and maybe there was some string that the test/bench utilities could print--- so otherwise some weird detection failure doesn't result in a user silently getting half the speed with no evidence of the cause. I've had cases before where I was accidentally linking the wrong library and was confused by the performance. e.g. "libsecp256k1-deadbeef gcc 10.2.1 le x86_64 s=64asm f=64asm clz ctz nobignum endo ..." |
So far this has not been needed, as it's only used by the static precomputation which always builds with 32-bit fields. This prepares for the ability to have __int128 detected on the C side, breaking that restriction.
89acf9e to
2cd6ea3
Compare
|
Changed to make the C side detect the availability of __int128, with the ability to override through a define (and a hidden configure option). An unexpected side effect here is that |
2cd6ea3 to
da16068
Compare
Instead of supporting configuration of the field and scalar size independently,
both are now controlled by the availability of a 64x64->128 bit multiplication
(currently only through __int128). This is autodetected from the C code through
__SIZEOF_INT128__, but can be overridden using configure's
--with-test-override-wide-multiply, or by defining
USE_FORCE_WIDEMUL_{INT64,INT128} manually.
da16068 to
79f1f7a
Compare
|
ACK 79f1f7a diff looks good and tests pass |
|
tACK 79f1f7a A. they all managed to parse it(some very old ones required replacing Interesting finding, clang added support(x86-64) for 128bit on 3.3. and gcc on 4.6.4 (from the versions available on godbolt, I did not check outside). |
|
clang 3.3. was in 2013 while gcc 4.6 was in 2011. |
…ailability
Summary:
```
This PR does two things:
It removes the ability to select the 5x52 field with a 8x32 scalar,
or the 10x26 field with a 4x64 scalar. It's both 128-bit wide versions,
or neither.
The choice is made automatically by the C code, unless overridden by
a USE_FORCE_WIDEMUL_INT{64,128} define (which is available through
configure with a hidden option
--with-test-override-wide-multiplication={auto,int64,int128}).
This reduces the reliance on autoconf for this performance-critical
configuration option, and also reduces the number of different
combinations to test.
This removes one theoretically useful combination: if you had x86_64 asm
but no __int128 support in your compiler, it was possible to use the
64-bit field before but the 32-bit scalar. I think this doesn't matter
as all compilers/systems that support (our) x86_64 asm also support
__int128. Furthermore, #767 will break this.
As an unexpected side effect, this also means the gen_context static
precomputation tool will now use __int128 based implementations when
available (which required an addition to the 5x52 field; see first
commit).
```
Backport of secp2561k [[bitcoin-core/secp256k1#793 | PR793]].
Depends on D7610.
Test Plan:
cmake -GNinja ..
ninja check-secp256k1
cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int64
ninja check-secp256k1
cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int128
ninja check-secp256k1
../configure
make -j4 check
../configure --with-test-override-wide-multiply=int64
make -j4 check
../configure --with-test-override-wide-multiply=int128
make -j4 check
Run the Travis build.
https://travis-ci.org/github/Fabcien/secp256k1/builds/731196575
Reviewers: #bitcoin_abc, deadalnix
Reviewed By: #bitcoin_abc, deadalnix
Differential Revision: https://reviews.bitcoinabc.org/D7629
…ailability
Summary:
```
This PR does two things:
It removes the ability to select the 5x52 field with a 8x32 scalar,
or the 10x26 field with a 4x64 scalar. It's both 128-bit wide versions,
or neither.
The choice is made automatically by the C code, unless overridden by
a USE_FORCE_WIDEMUL_INT{64,128} define (which is available through
configure with a hidden option
--with-test-override-wide-multiplication={auto,int64,int128}).
This reduces the reliance on autoconf for this performance-critical
configuration option, and also reduces the number of different
combinations to test.
This removes one theoretically useful combination: if you had x86_64 asm
but no __int128 support in your compiler, it was possible to use the
64-bit field before but the 32-bit scalar. I think this doesn't matter
as all compilers/systems that support (our) x86_64 asm also support
__int128. Furthermore, #767 will break this.
As an unexpected side effect, this also means the gen_context static
precomputation tool will now use __int128 based implementations when
available (which required an addition to the 5x52 field; see first
commit).
```
Backport of secp2561k [[bitcoin-core/secp256k1#793 | PR793]].
Depends on D7610.
Test Plan:
cmake -GNinja ..
ninja check-secp256k1
cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int64
ninja check-secp256k1
cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int128
ninja check-secp256k1
../configure
make -j4 check
../configure --with-test-override-wide-multiply=int64
make -j4 check
../configure --with-test-override-wide-multiply=int128
make -j4 check
Run the Travis build.
https://travis-ci.org/github/Fabcien/secp256k1/builds/731196575
Reviewers: #bitcoin_abc, deadalnix
Reviewed By: #bitcoin_abc, deadalnix
Differential Revision: https://reviews.bitcoinabc.org/D7629
This PR does two things:
configurewith a hidden option --with-test-override-wide-multiplication={auto,int64,int128}).This reduces the reliance on autoconf for this performance-critical configuration option, and also reduces the number of different combinations to test.
This removes one theoretically useful combination: if you had x86_64 asm but no __int128 support in your compiler, it was possible to use the 64-bit field before but the 32-bit scalar. I think this doesn't matter as all compilers/systems that support (our) x86_64 asm also support __int128. Furthermore, #767 will break this.
As an unexpected side effect, this also means the
gen_contextstatic precomputation tool will now use __int128 based implementations when available (which required an addition to the 5x52 field; see first commit).