Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the use of simd with platform intrinsics for comparisons and boolean reductions in vectors. #85

Closed
wants to merge 1 commit into from

Conversation

xMAC94x
Copy link
Contributor

@xMAC94x xMAC94x commented Apr 4, 2022

  • Unfortuatly there is a bug in rust/llvm that doesn't allow us to compile vek im some cases (see example)

  • Added a pass-by-value fn that can be used together with u8 to avoid the respective error
    Downside is, the interface is a bit uglier.

    Try to run the attached example and compile the failing function cargo build is necessary, cargo checkdoesnt work.
    This PR provides a workaround that can be used to make use of SIMD in this case ;)

    Tell me if you need any adjustments.

    Edit: just saw the contributing section, do you still need a issue opened ?

…ean reductions in vectors.

- Unfortuatly there is a bug in rust/llvm that doesn't allow us to compile vek im some cases (see example)
- Added a pass-by-value fn that can be used together with u8 to avoid the respective error
  Downside is, the interface is a bit uglier.
@xMAC94x xMAC94x changed the title Fix the use of simd with platform intrinsics for comparisons and bool… Fix the use of simd with platform intrinsics for comparisons and boolean reductions in vectors. Apr 4, 2022
@yoanlcq
Copy link
Owner

yoanlcq commented Apr 5, 2022

Hi, thanks for the PR!

do you still need a issue opened ?

No it's fine, the point is just to avoid situations where the author spent a huge amount of time on something that will actually not end up being merged; you know, just in case.

Anyway, I swear SIMD vectors or bools are a pain... I'll try to fiddle with that, I'm not sure I like having extra _pbv() functions (regardless of the suffix we end up using), I wish the existing interface would rather "do the right thing", but it also seems hard to accomplish.

It's great that you added an example with a detailed description of the error.
I'll do some experiments on my end as soon as I can and get back to you; in any case, if we can't find a better solution, then I guess we can merge this.

Thanks.

@xMAC94x
Copy link
Contributor Author

xMAC94x commented Apr 5, 2022

Thanks for understanding. We use this fix for over a year in veloren code, it was initially made by @pythonesque .

Yeah, having a second fn is unclean, but it doesnt seem like SIMD vectors will be less of a pain soon. One could also just export those e.g. when using the platform_intrinsic or some other feature if you wish.

Let me know if you find a more elegant solution to the problem ;)

@yoanlcq
Copy link
Owner

yoanlcq commented Apr 5, 2022

Hi, so I've tested some stuff and came to a few conclusions:

  • simd_reduce_all/simd_reduce_any only accept integer vectors (not bool, not floats);
  • simd_le (and other comparisons) only accept returning integer vectors (again, not bool or float).
    Furthermore, they only accept vectors passed by value (not reference).

And some observations:

  • On x86, we can see that _mm_cmple_ps takes (conceptually) a Vec4<f32> and returns Vec4<f32>. This means the constraints above, enforced by the compiler, are not an accurate statement about the real CPU instruction.
  • On ARM, the equivalent is (in C) uint32x2_t vcle_f32(float32x2_t a, float32x2_t b);, i.e takes Vec4<f32> and returns Vec4<u32>. The point is that the signature is't the same as on x86.

If we're serious about SIMD, then we have to somehow take that into account in our implementation.
At least, I think that Vec4<T>::cmpxx -> Vec4<u8> should rather be Vec4<T>::cmpxx -> Vec4<I> where I is the unsigned integer type with the same size as T. That I type could be obtained from a trait we would require T to implement.

The changes I would plan are:

  • vek's current cmpxx API simply cannot compile with simd_llvm, but it must still exist. So the cmpxx implementation should fall back to the repr_c implementation. SIMD users will have to use new functions described below.
  • Vector types gain a new API, cmpxx_native (or some other name), which signature is pub fn cmpxx_native(self, rhs: Self) -> $Vec<T::SameSizeUint> where T: vek::SimdElement + $ComparisonBounds.
    The signature is such that it can be implemented via the same SIMD intrinsics we've been using so far.
  • Integer vector types gain reduce_and() and reduce_or().
  • Boolean vector types: reduce_and() and reduce_or() are implemented "as-if" the vector was cast to a vector of u8. For repr_c, no change is necessary.
  • reduce_ne() should be deprecated because it makes no sense; and for types other than bool, the compiler rightfully complains that comparison operators cannot be chained.

I've started work on this but I'll resume later. I'll add an example, as you did, to showcase the new way of doing things.
Feel free to let me know what you think!

@xMAC94x
Copy link
Contributor Author

xMAC94x commented Apr 5, 2022

HI @yoanlcq , nice findings. I am by far not so into the topic but here are my 2 cents:

  • In general i like the overall idea to build a robust SIMD api.
  • Silently going to repr_ccan be bad for performance. Can we maybe do like you said in version 0.16 but get rid of the repr_c functions in 0.17 ? Or at least give the user some indecator that they are actually not using simd operations, even when they created the simd version ?
  • How will https://github.com/rust-lang/portable-simd fir in this image ? do they build a wrapper around that weird x64, arm behavior ?

@yoanlcq
Copy link
Owner

yoanlcq commented Apr 6, 2022

They use platform-intrinsics as I do (https://rust-lang.github.io/portable-simd/src/core_simd/intrinsics.rs.html), but it appears their source file is better documented, and I think they have some intrinsics I didn't know about.
In any case that's what they use for their SimdPartialOrd trait and friends, so it's equivalent to what we're doing; except that the signature of their functions match what the intrinsics expect (rather than Rust's convention to accept inputs by-ref for comparison operations, which I followed when initially writing the crate.)

So their comparison functions are like so :

           #[inline]
            fn simd_lt(self, other: Self) -> Self::Mask {
                // Safety: `self` is a vector, and the result of the comparison
                // is always a valid mask.
                unsafe { Mask::from_int_unchecked(intrinsics::simd_lt(self, other)) }
            }

I'm not sure how worth adding a Mask type is; the intrinsics actually use plain integers. Sure it adds type-safety, though, by preventing interpretation of the output as plain integers, I guess.
In any case it can make us confident that the intrinsics can be trusted to do the right thing, regardless of the CPU arch.
(But really, the only way to be sure is (as always) to look a the generated assembly in a release build (this can be done in https://rust.godbolt.org/ with -C opt-level=3 but it needs a minimal example written from scratch)).

Silently going to repr_c can be bad for performance

Yes but, it's better than "it doesn't compile"; though I agree with you that the function would be "lying", which is something that I don't like; but making it "good" would require changing the signature (accept owned values + return vector of ints/masks), which is a breaking change.
It would indeed be nice to have some mechanism to warn users that use the "wrong" functions with SIMD, but it also should not bother those who use repr_c. I'll have to think about how to do it.

@xMAC94x
Copy link
Contributor Author

xMAC94x commented Apr 24, 2022

Hi @yoanlcq , is there any update on the implementation you tried (:

yoanlcq added a commit that referenced this pull request Apr 24, 2022
@yoanlcq
Copy link
Owner

yoanlcq commented Apr 24, 2022

Hi, sorry to keep you waiting, I've just pushed a new commit to the main branch, as can be seen, which should address the issues.

I'm not yet fully confident about publishing to a new version right now; I mean, it was tested and all, but I'd feel better about it if you reviewed it, to confirm that I didn't forget anything and that it does solve your issue.

Thanks!

@Imberflur
Copy link
Contributor

Imberflur commented May 2, 2022

Hi, I looked through the changes and they appear to address the issue and provide the same functionality we were using. @xMAC94x also setup a branch patching in the latest vek from this repo and that passes our CI checks.

I made one note where I thought some helpful information could be added to the docs 3febca2#r72638972.

@yoanlcq
Copy link
Owner

yoanlcq commented May 3, 2022

All right, I've added a commit to address this, and with that, it looks like we're good to go! So I went ahead and published these changes as 0.15.8.

I'll close this PR because I assume we're done, but as usual, feel free to re-open if you encounter issues or have other remarks.
Thanks!

@yoanlcq yoanlcq closed this May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants