remove HAS_FAST_INTEGER_VECTORS checks from panama support.#14901
remove HAS_FAST_INTEGER_VECTORS checks from panama support.#14901rmuir merged 3 commits intoapache:mainfrom
Conversation
For any integer vectorization code, the developer must remember to make this check, or suffer a 10x+ slowdown if AVX2 is unavailable. This can happen in virtual environments (default QEMU, virtualbox, etc). It isn't worth the benefit of supporting floating point vectors on such machines, just remove this trap completely.
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
@uschindler this one may impact jenkins randomization (simplify it, I think). |
jpountz
left a comment
There was a problem hiding this comment.
I like the simplification.
|
so now we throw an exception instead to guard the unwary from falling into the trap? Makes sense to me. |
|
@msokolov previously, on x86 machines without AVX2 (e.g. AVX1 or SSSE), the vector API was still "enabled". But it is a trap, as many operations are not supported, and run 10x slower (or more). With this change, we just disable vectors completely on such setups (floating point too). It means developers don't need to guard all the integer functions with Goal is to remove the trap: these days you are most likely to encounter such a setup via virtualization, that isn't setup in a fully optimized way (e.g. casual desktop user with virtualbox or qemu or something). |
...ne/core/src/java24/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java
Show resolved
Hide resolved
|
The bug Uwe found is truly crazy. Need to step back here and fix SOMETHING to fail on it... No java linter finds the issue at compile time, which is really surprising to me. With clang the problem is caught immediately by At runtime, the test passes, UNLESS you run it with In the short-term, I think this is the easiest to address. I know we don't want tests to print but maybe we can hack the test so it fails without |
|
Theres a subfolder of related checks in error-prone to investigate: https://github.com/google/error-prone/tree/d17e312d982badf2070404dde11b88c2538a5222/core/src/main/java/com/google/errorprone/bugpatterns/formatstring Need to reaudit all the available checks anyway (as clearly these were missed by me before), but this might be a good compile-time way to find it. |
|
I think, this chekc can now also be removed: |
Actually it doe snot throw the exception visible to the enduser. Just the provider interface's ctor throws it so the SPI implementation falls back to default implementation. This disables Panama Vectors completely. Previously the downstream code in the Panama SPI had to do the constant check at many places. |
I still don't understand why it does not fail the tests. The info loggig is always executed. Maybe it fails before due to randomization sometimes? We want to test Panama Vector code also on non-supported hardware, so it creates a "Test instance" of the provider which is slow, but not on production. I will check close later, just update code first. It is a good thing to remove complexity for "seldom" CPU flag combinations. Actually in KVM w/ QEMU (when using libvirt) it never disables if you have selected CPU type "Host Passthrough" (like Policeman Jenkins). |
Should not affect Jenkins, we don't randomize that flag, it is/was done by Gradle itsself. |
What's the repro line for this? I don't think this should matter (gradle-wise); LuceneTestCase uses this flag - to set the verbose constant, maybe something depends on this. |
| } | ||
|
|
||
| if (PanamaVectorConstants.HAS_FAST_INTEGER_VECTORS == false) { | ||
| throw new UnsupportedOperationException("No integer vector support"); |
There was a problem hiding this comment.
I'd change this message a bit more verbose as this is logged to enduser in a warning (see VectorizationProvider around line 169).
Maybe "CPU flags do not guarantee fast integer vectors".
I think the test should fail due to the IllegalArgumentException anyways. There is some stupid bullshit going on. Verbose or not should both create the same result. As said before, I think this did not fail on the github runner, because the virtualization does not enable panama at all. And a "test only" enforcement of bitsize is done randomly, so it won't fail all the time. The problem with the above |
|
Hi for me the test failed without verbose: |
It's not about tests.verbose - it's about the seed. This code is either hit or not, depending on the main tests.seed - some other randomized, derived, parameters are affecting it. |
|
This: the vector size is randomized from this pool: if it's 128, that code path is never reached. |
Exactly! When running it multiple times it fails 1/3 of it.... If the randomly assigned vector bitsize does not match expectations it fails. The problem with this PR is: We don't test the integer vectors anymore because Robert removed the "forceIntegerVectors" flag. I think it should stay alive (so please revert removal of the sysprop). |
|
So this will never fail: but this will always fail: |
|
Ok, so the vectorSize is missing in the test reproducer line. Anyways, thats an old issue. My problem with Robert's patch is that he removed the second system property To keep the sysprop away, my proposal would be: Whenever somebody sets the "bitsize" we should NOT bail out and execute the tests in TestVectorUtil. |
|
We should only apply this PR on main branch, because I don't want people to get sudden slowdowns in 10.x branch. |
Who would get a slowdown? If they are impacted by this PR, they've probably got a 10x slowdown happening somewhere already :) I think the ongoing risk would be, that developer adds a new function to PanamaVectorUtilSupport.java, backports it, and then it causes a new 10x-20x slowdown. That's the motivation behind this PR anyway. It shouldn't slow anyone down, just requires AVX2 on amd64 machines before we try to vectorize. IMO best to disable vectorization completely in such cases, it will actually prevent a 10x slowdown. "supporting" some SSSE-based 2x speedup for a couple floating point methods isn't worth it? |
lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java
Show resolved
Hide resolved
| Constants.OS_ARCH.equals("amd64") && PREFERRED_VECTOR_BITSIZE < 256; | ||
| HAS_FAST_INTEGER_VECTORS = | ||
| VectorizationProvider.TESTS_FORCE_INTEGER_VECTORS || (isAMD64withoutAVX2 == false); | ||
| HAS_FAST_INTEGER_VECTORS = isAMD64withoutAVX2 == false; |
There was a problem hiding this comment.
See above and let's do it like that:
HAS_FAST_INTEGER_VECTORS = isAMD64withoutAVX2 == false || TESTS_VECTOR_SIZE.isPresent()There was a problem hiding this comment.
In the special test mode I'd like to test our code, although CPU is wrong.
There was a problem hiding this comment.
if we can better contain this constant we can also give it a better name. The idea is: if you are on x86, you need AVX2 as a minimum baseline for us to support vectors.
There was a problem hiding this comment.
That's OK. I just want the condition to stay alive here.
That's the reason why I want the test mode keep testing integer vectors also on Intel CPUs without AVX2. |
|
There's nothing to test. This change makes it so developers only support avx2 and avx512. That's the idea. I will fix the name of the confusing constant and consider inverting the logic to call it something like "ancientIntel". It's not related to anything about integers anymore. If you want vectors on x86 you need AVX256 or AVX512. |
Please read again what I said: I want the test mode (where we randomize also preferred bitsize) to also always execute the correctness of the integer code. Testing can be slow, but the integer code has to be tested. Therefor please remove the UOE in test mode. That's plain simple. If you don't agree with that let's remove all test-mode code. Now it is incomplete. |
|
I understand but the terminology is killing us. There's no more integer-specific stuff anymore, just 128bit case. And we should test it... Slowly, and maybe throw in 64 too! But the purpose of this change is to remove the confusion around integers and floats. Instead it's just 128, 256, 512. If you are on Intel and only have 128, you get no special sauce. |
|
Thats all fine, then remove the constant. It is no longer risky because the bail out is at one single place: In production Maybe add a constant called "PANAMA_TEST_MODE" that is enabled once you set bitsize. Let's remove the second sysprop, just keep the "bitsize". Once any bitsize is enabled, lets stay in test mode and therefor the tests for integers should run. That is a one-line change, so it does not add confusion, just rename the constant. I am just waiting for you to finish and then add my changes, I did not want to interfere with you. Sorry: I wrote the original code with the systemprops. It's a bit complicated, so I only try to help. And because of the problem with the formatter we noticed the problem at all: Github runner never executes Panama Tests anymore, because it has no AVX2. |
|
Its helpful, it points out the confusion. The constant needs renaming, to really improve this. Separately testing needs "backdoor" (which may not be working correctly with this PR in its current state), otherwise nothing will reproduce for anyone, because everyone has different hardware. |
|
I can apply my modifications. I'd suggest to not add a constant and remove the property as you did. I will just change the code to test integer vectors when a enforced bitsize is also given by |
|
I would also remove the constant so people don't even think about checking |
|
if you have the time, please feel free. I won't get to this until later. |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
OK changes pushed. |
|
When verifying the PanamaVectorConstants file I found out that |
+1 to nuke it. this is the last place we want to have dead code... |
done. |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
I merged this (hopefully not prematurely) because I'm looking into options to enforce correct checks are happening and this makes the code confusing. |


For any integer vectorization code, the developer must remember to make this check, or suffer a 10x+ slowdown if AVX2 is unavailable. This can happen in virtual environments (default QEMU, virtualbox, etc).
It isn't worth the benefit of supporting floating point vectors on such machines, just remove this trap completely.