-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Windows threading performance scaling #4103
Conversation
Failed builds look related to PR #4092 |
Appveyor build (using old mingw version of gcc) failed with
which does seem related to your changes - the others are just timing out after getting scheduled on old&slow hardware |
Thank you, I will look into that.
|
Appears to be a deficiency in (mingw-) GCC <= 5.3, I've now seen the gcc intrinsic __sync_val_compare_and_swap used as a replacement e.g. in https://github.com/migueldeicaza/mono-wasm-mono/blob/master/mono/utils/atomic.h (could probably be made conditional on GNUC being defined and less than 6) |
It looks like the MSYS2 runners using gcc are stuck indefinitely during the test for Is that a deficiency in GCC from MSYS2? Or is there something odd with this change? |
I was hoping it was just gh runners acting up, as I see no reason why the jobs would hang precisely there. Trying now with the stealth equivalent of #4102 reverted |
I wonder if it had to do with re-entrancy in `exec_blas_async`. I had that as a question. I will look. Can we get a callstack from the hang?
|
Probably not from the workflow run, guess I'll have to reproduce this locally. |
Seems to get stuck in dgetrf (from TESTING/LIN/xlintstds <dstest.in) :
with these threads
(this was built with msys-mingw64 (msys2 version 20190524), gcc 11.3 on a quad-core SkylakeX (Xeon W-2123) under Windows 11) |
@martin-frbg Let me open a new branch on my side, with a PR to try to fix this. In optimizing away locks I removed one that may be questionable. That was a lock to handle calling I did not think I was able to find a case where that happens. As a result, I removed a lock and logic that tested for and appended a new work queue to an existing one if an async work-ahead call was made. I can put that code back and see if it fixes the problem. |
It looks to me like the only place in the entire codebase where I will put the protection on that case back in place and send a new PR so we can see if that fixes it. |
See PR #4112 for a possible fix. |
OK, so I don't think that fixed it. There must be an issue with |
I guess I'll revert your contribution for now, just to avoid having the problematic behavior end up in somebody's build (or even distribution) |
Yes, thank you and sorry. I will study the code further to see if I can come up with a better solution. Do you know if it is possible to run that particular test that was failing in isolation? I'd like to debug it to understand the calling pattern better. |
Sure - in lapack-netlib/TESTING, run LIN/xlintstds.exe with dstest.in as (standard) input. You can edit dstest.in to cut down the number of matrix sizes it tries, and to keep it from testing dposv as well. (Have not checked which matrix size shows the problem first though - backtrace suggests it is M=101) |
No description provided.