-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to fixed numpy ci test failures #4794
Conversation
Thanks - I've come to the same conclusion (except that my version uses a changed dummy2 in the calls that do not originate from interface - which is a bit less practical than your suggestion but comes from me starting with the last failure in the numpy log). Guess it would make more sense to use dummy2=1 as the flag rather than 12, or is there a significance that I do not realize ? |
Thanks for your reply. Using dummy=12 has no special meaning (I was concerned that dummy2=1 might already be used as a flag). I will revise it to dummy=1 and test it. |
I also have an idea of identifying certain specific interfaces that shouldn't call SCAL_K and providing separate optimizations for them. The current approach is making the code less maintainable. |
Yes, getting dummy2 into all the assembly code is not going to be fun, but the only alternative I can imagine is creating a second set of SCAL kernels so that one keeps the previous behaviour - this would make the code even less maintainable. (On the other hand, it might be better for performance if we can avoid a lot of new conditionals on the strictly internal code paths that way) |
Currently, the call to the revised SCAL_K in interface/gemv and driver/level3/syrk_k.c has caused the NUMPY CI test to fail. The test code for the cblas_dgemv interface is as follows:
With
With
The syrk interface is likely a general issue as well. For these two interfaces, when beta=0.0, the output needs to be all zeros. |
One solution is to check if beta is 0.0 before calling SCAL_K in |
the one remaining error with your fix is due to a small oversight in your revision of dscal.c - in the loop on "n1" the code should call dscal_kernel_8_zero when da=0&&dummy2=0 |
ada8733
to
228ac4b
Compare
228ac4b
to
f3cebb3
Compare
Thank you, I've updated the code. |
CodSpeed Performance ReportMerging #4794 will degrade performances by 10.97%Comparing Summary
Benchmarks breakdown
|
It seems that the performance of gesadd has declined; further adjustments are needed |
gesdd has scal on its call graph https://netlib.org/lapack/explore-html-3.6.1/d4/dca/group__real_g_esing_gaf60b27e77bfbeffe1ec63e0f360c4564_gaf60b27e77bfbeffe1ec63e0f360c4564_cgraph_org.svg but there is also some codspeed-specific problem with this particular benchmark (#4776). Maybe your changes happened to fix the premature exit (at least until the assembly kernels are updated) and the "regression" is actually the benchmark running to completion instead of erroring out early. |
I have fixes for x86_64 lined up (still need to test on Windows to check I read the correct argument there). will try to do arm64, ppc and riscv today |
Okay, I can fix mips/mips64. I also plan to add unit tests for s/dgemv to ensure it can follow the correct branch of the SCAL_K interface. |
404a1f3
to
2a7000a
Compare
2a7000a
to
34b80ce
Compare
Trying to use latest HEAD over at openblas-libs, I am seeing a failure in i686
This is using
inside a quay.io/pypa/manylinux2014_i686 docker image on a linux x86_64 host. |
hopefully fixed by #4817 - I seem to have gotten the stack offset to the flag value wrong in the single-precision case |
Besides the S/D/C/ZSCAL interfaces in BLAS, there are many other interfaces that also call SCAL_K. When fixing issue #4728 , the behavior of those interfaces that call SCAL_K was also changed, which caused some NUMPY CI test failures. I made revisions on an AMD Ryzen 2600 to ensure that the behavior of those interfaces calling SCAL_K remains consistent with before. The local NUMPY (v1.26.0)CI test results are as follows:
The number of failed test cases will be reduced to one (there is still one that needs further investigation).