Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected result of short integer SimdArray division #257

Open
kfjahnke opened this issue Dec 2, 2020 · 9 comments
Open

unexpected result of short integer SimdArray division #257

kfjahnke opened this issue Dec 2, 2020 · 9 comments

Comments

@kfjahnke
Copy link
Collaborator

kfjahnke commented Dec 2, 2020

Vc version / revision | Operating System | Compiler & Version | Compiler Flags | Assembler & Version | CPU
1.4.1 | ubuntu 20.04 | g++ 9.3.0 | -Ofast | | Haswell core i5

Testcase

#include <Vc/Vc>

int main ( int argc , char * argv[] )
{
  typedef Vc::Vector < short > v_t ;

  v_t lhs ( 2 ) ;
  v_t rhs ( 1 ) ;

  auto result = lhs / rhs ;
  assert ( result [ 0 ] == 2 ) ;
}

Actual Results

when I compile this with
g++ -Ofast opdiv_bug.cc -lVc
And run it, I get:
$ ./a.out
a.out: opdiv_bug.cc:11: int main(int, char**): Assertion `result [ 0 ] == 2' failed.

Expected Results

I'd expect the assertion to hold. Compiling with -O3 does produce code which does not fail the assertion, and when using 'int' instead of 'short' it holds as well.

I'm not sure what to make of this, and it may not be a Vc problem, but rather an error of the optimizer. The problem first showed when processing C vectors of Vc::SimdArray with clang++, even though clang++ does not produce the error when used for the example above, so it does not seem to be occuring only with g++, but is easier to provoke with it.

Looking at the assembler code produced with -Ofast, there is no division instruction. fast math would use a multiplication with the reciprocal, and the assembler opcode to produce a reciprocal is indeed present, but the result should still be correct, because the reciprocal of one is one and multiplying with it should have no effect. Any ideas?

@mattkretz
Copy link
Member

Vectors of short are converted to vectors of float to use the (v)divps instruction for vectorized division. Thus, -ffast-math kicks in and may lead to surprising results. I made std-simd pass all tests with -ffast-math — and it's been a lot of surprises and headaches. I'll take a look what I did for division. But in any case Vc and -ffast-math are not well tested (and supported).

@mattkretz
Copy link
Member

@kfjahnke
Copy link
Collaborator Author

kfjahnke commented Dec 8, 2020

What really surprised me that division by one should produce an erroneous result. This is the one case where multiplication with the reciprocal should be fine, and correct for all numbers. Instead, with g++ and -Ofast, all values in the result vector are one less than they should be:

`#include <Vc/Vc>
#include

int main ( int argc , char * argv[] )
{
typedef Vc::SimdArray < short , 8 > v_t ;

v_t lhs { 2 , 3 , 4 , 5 , 6 , 7, 8 , 9} ;
v_t rhs ( 1 ) ;

auto result = lhs / rhs ;
std::cout << result << std::endl ;
assert ( result [ 0 ] == 2 ) ;
}
`
result:
[1, 2, 3, 4, 5, 6, 7, 8]
but it should be
[2, 3, 4, 5, 6, 7, 8, 9]
which clang++ correctly produces with -Ofast, so it may be a g++ issue.

@mattkretz
Copy link
Member

rcpps of 1.f is 0.999756f 🤷. And GCC translates xmm0 / xmm1 to:

  vrcpps xmm2, xmm1
  vmulps xmm1, xmm2, xmm1
  vmulps xmm1, xmm2, xmm1
  vaddps xmm2, xmm2, xmm2
  vsubps xmm2, xmm2, xmm1
  vmulps xmm0, xmm0, xmm2

I agree this is surprising.

@kfjahnke
Copy link
Collaborator Author

kfjahnke commented Dec 9, 2020

So my error was to assume that forming the reciprocal of 1.0f would yield 1.0f.
0.999756f explains the result: every multiplication produces a result which is just a tad below the full integer, and hence truncated. Thanks for the clarification!

But in any case Vc and -ffast-math are not well tested (and supported).

I use -Ofast throughout vspline and pv, and so far this has gone well and it's noticeably faster. Most of my code is floating point arithmetic, though. I only noticed this issue because I implemented my own SIMD type (see https://bitbucket.org/kfj/vspline/src/master/simd_type.h) which I use as a stand-in for Vc::SimdArray when the target can't use it, or for fundamentals Vc won't vectorize. So I wrote test code to make sure that all the operator functions work as expected. The test code can run with Vc SimdArrays as well, and that coughed up the problem.

@mattkretz
Copy link
Member

I guess I should prioritize gather & scatter for you so you can move to std-simd. ;-)

@kfjahnke
Copy link
Collaborator Author

kfjahnke commented Dec 9, 2020

Hey, you still remember me!
Yes, std::simd would be great, and I also need to be able to use clang++, it optimizes my code much better than g++.

@mattkretz
Copy link
Member

Sorry if you feel/felt ignored. I was occupied with stupid corner cases like from -ffast-math. 😉 I'm bad at working on more than one thing at a time...
Please report issues if recent clang isn't working with std-simd. Once it's in libstdc++ the expectation is that clang works

@kfjahnke
Copy link
Collaborator Author

kfjahnke commented Dec 9, 2020

Sorry if you feel/felt ignored. I was occupied with stupid corner cases like from -ffast-math. wink I'm bad at working on more than one thing at a time...

You have my sympathy... What I like about generic programming, like with my simd_type, is that one can get away without much of the nitty-gritty. But I suppose someone has to do it. And I'm also bad at multitasking.

So now your code does work with clang++ - I just checked! And, indeed, the test code I wrote a year or so back comes out some 30% faster than the g++ binary - BTW with -Ofast. When I last tried, clang++ just spat out lots of errors. Now this is good news!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants