-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX512F instructions #3
Comments
Hi Manodeep, Thanks for your feedback! The macros in Regarding the Cheers, Pedro |
Hi @manodeep, First of all thanks for the support! Regarding the
Masked loads with Also, how do you support this functionality in Thanks, James |
Here's how my SIMD intrinsics work with Copy-pasting the effective code (note that single and double precision are supported with the following): /* Stuff in headers */
const uint16_t masks_per_misalignment_value_float[] = {
0b1111111111111111,
0b0000000000000001,
0b0000000000000011,
0b0000000000000111,
0b0000000000001111,
0b0000000000011111,
0b0000000000111111,
0b0000000001111111,
0b0000000011111111,
0b0000000111111111,
0b0000001111111111,
0b0000011111111111,
0b0000111111111111,
0b0001111111111111,
0b0011111111111111,
0b0111111111111111};
const uint8_t masks_per_misalignment_value_double[] = {
0b11111111,
0b00000001,
0b00000011,
0b00000111,
0b00001111,
0b00011111,
0b00111111,
0b01111111};
#ifdef DOUBLE_PREC
/* calculate in doubles */
#define DOUBLE double
#define AVX512_NVEC 8
#define AVX512_FLOATS __m512d
#define AVX512_MASKZ_LOAD_FLOATS_UNALIGNED(MASK, X) _mm512_maskz_loadu_pd(MASK, X)
#else
/* calculate with floats */
#define DOUBLE float
#define AVX512_NVEC 16
#define AVX512_FLOATS __m512
#define AVX512_MASKZ_LOAD_FLOATS_UNALIGNED(MASK, X) _mm512_maskz_loadu_ps(MASK, X)
#endif
/* end of stuff in headers */
/* Begin kernel code */
for(int64_t j=n_off;j<N1;j+=AVX512_NVEC) {
AVX512_MASK m_mask_left = (N1 - j) >= AVX512_NVEC ? ~0:masks_per_misalignment_value_DOUBLE[N1-j];
/* Perform a mask load -> does not touch any memory not explicitly set via mask */
const AVX512_FLOATS m_x1 = AVX512_MASKZ_LOAD_FLOATS_UNALIGNED(m_mask_left, localx1);
...
} Of course such masked loads are not supported by |
Another set of new |
We could make use of masked loads in our code, however we want to support We use |
AFAICS, |
Hi,
First of all - thanks for creating (and open-sourcing) this swift code! Looks great!
I was looking through the SIMD wrappers for
AVX512F
invector.h
and I noticed a few wrappers that refer to non-existent intrinsics (at least inAVX512F
) or have better implementations. In particular,vec_and
maps to_mm512_and_ps
, which does not exist (at least according to the Intel Intrinsics Guide). From the looks of it, alland/or
operations are now only relevant formasks
and not for individual data-types.I also saw that
vec_fabs
is implemented via two intrinsics -- is the new_mm512_abs_ps
intrinsic too slow?I am also curious - I do not see any references to any
mask(z)_load
. I found those masks quite useful for staying in SIMD mode and eliminating the serial part of the code (dealing with remainder loops for array lengths not divisible by the SIMD width).Once again, the performance gains look awesome!
The text was updated successfully, but these errors were encountered: