-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vector_algorithms.cpp
: Remove the distinction between SSE2 and SSE4.2
#4536
Comments
In this case some algorithms that avoid SSE4.2 may start using it. This allows simplifying/optimizing bitsett vectorization, also maybe some of reversing algos coud be direct What is the fraction of SSE4.2 but not AVX? |
Ah, even better! 😻
In theory I could get accurate numbers, but I'd have to ask around through several people. The 2024-02 Steam Hardware Survey suggests a lower bound - it says that 0.46% of CPUs don't support SSE4.2 (which indicates that it's a vaguely reasonable lower bound for the numbers across all CPUs that we target, not just performance-minded gamers, given the similarity to the number I heard), while 6.75% don't support AVX2. I'd guess the actual number for us is in the range of 10-20%, so that AVX2 optimizations benefit the vast majority of users, but that we won't be able to assume its existence for a decade. |
Oh, I see there are still a lot of machines with AVX and not AVX2... |
Exactly. if (Has_AVX2()) {
...
} else if (Has_AVX()) {
...
} else if (Has_SSE()) { //All SSEs (2, 3, 4.*).
...
} else { //Scalar.
...
} AVX512 is out of this equation for the next decade I believe. |
Adding codepaths to distinguish AVX1 from AVX2 raises the same sort of hazards that I'm concerned about with SSE2 versus SSE4.2. Although the AVX1/AVX2 delta is maybe 4% of processors, I think the risk isn't worth it. |
Then, am I right that your suggestion is: //vector_algorithms.cpp
if (Has_AVX()) { //Exactly AVX1 and AVX2.
...
} else if (Has_SSE()) { //All SSEs (2, 3, 4.*).
...
} else { //Scalar.
...
} |
Yes, and we already have these functions (they are properly named STL/stl/src/vector_algorithms.cpp Lines 25 to 39 in be81252
|
During code review, I've prevented two bugs where usage of post-SSE2 instructions was being incorrectly guarded with
_Use_sse2()
- see #4384 (comment) and #4495 (comment). This is extremely hazardous, and the correctness of the STL shouldn't depend on whether I've had 270 mg of caffeine every single time I've reviewed a vectorization PR.At this time, we still need to support the tiny fraction (~0.7%, I've heard) of processors that have SSE2 but not SSE4.2. However, we don't need to extend novel optimizations to them - they were perfectly happy running classic STL algorithms up to 2019.
We should prevent this class of mistakes by removing the distinction between SSE2 and SSE4.2 in
vector_algorithms.cpp
. That is, we should test for the presence of SSE4.2 only, before attempting to use anything up to and including SSE4.2. (This will supersede the error-prone_Traits::_Sse_available()
.)We'll still need a distinction between "SSE4.2 is available" and "AVX2 is available", but I consider this to be much less dangerous, because AVX/AVX2 intrinsics and types are very distinctive.
The text was updated successfully, but these errors were encountered: