Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to fixed numpy ci test failures #4794

Merged
merged 5 commits into from
Jul 25, 2024

Conversation

XiWeiGu
Copy link
Contributor

@XiWeiGu XiWeiGu commented Jul 10, 2024

Besides the S/D/C/ZSCAL interfaces in BLAS, there are many other interfaces that also call SCAL_K. When fixing issue #4728 , the behavior of those interfaces that call SCAL_K was also changed, which caused some NUMPY CI test failures. I made revisions on an AMD Ryzen 2600 to ensure that the behavior of those interfaces calling SCAL_K remains consistent with before. The local NUMPY (v1.26.0)CI test results are as follows:

FAILED numpy/linalg/tests/test_linalg.py::TestEigvals::test_generalized_sq_cases - AssertionError: In test case: <LinalgCase: 8x8_tile213>
==================================== 1 failed, 37721 passed, 1990 skipped, 32 xfailed, 2 xpassed in 414.31s (0:06:54) ====================================

The number of failed test cases will be reduced to one (there is still one that needs further investigation).

@martin-frbg
Copy link
Collaborator

Thanks - I've come to the same conclusion (except that my version uses a changed dummy2 in the calls that do not originate from interface - which is a bit less practical than your suggestion but comes from me starting with the last failure in the numpy log). Guess it would make more sense to use dummy2=1 as the flag rather than 12, or is there a significance that I do not realize ?

@XiWeiGu
Copy link
Contributor Author

XiWeiGu commented Jul 10, 2024

Thanks for your reply. Using dummy=12 has no special meaning (I was concerned that dummy2=1 might already be used as a flag). I will revise it to dummy=1 and test it.
Optimization code for SCAL_K on other platforms also needs to be revised, which is quite a workload.

@XiWeiGu
Copy link
Contributor Author

XiWeiGu commented Jul 10, 2024

I also have an idea of identifying certain specific interfaces that shouldn't call SCAL_K and providing separate optimizations for them. The current approach is making the code less maintainable.

@martin-frbg
Copy link
Collaborator

Yes, getting dummy2 into all the assembly code is not going to be fun, but the only alternative I can imagine is creating a second set of SCAL kernels so that one keeps the previous behaviour - this would make the code even less maintainable. (On the other hand, it might be better for performance if we can avoid a lot of new conditionals on the strictly internal code paths that way)

@XiWeiGu
Copy link
Contributor Author

XiWeiGu commented Jul 12, 2024

Currently, the call to the revised SCAL_K in interface/gemv and driver/level3/syrk_k.c has caused the NUMPY CI test to fail. The test code for the cblas_dgemv interface is as follows:

#include <stdio.h>
#include <math.h>
#include <cblas.h>

int main() {
    int m = 3;
    int n = 2;

    double A[6] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0};
    double x[2] = {1.0, 1.0};
    double y[3] = {NAN, INFINITY, 0.0};

    double alpha = 1.0, beta = 0.0;
    cblas_dgemv(CblasRowMajor, CblasNoTrans, m, n, alpha, A, n, x, 1, beta, y, 1);

    printf("Result vector y:\n");
    for(int i = 0; i < m; i++) {
        printf("%f\n", y[i]);
    }

    return 0;
}

With ATLAS 3.10.3:

Result vector y:
3.000000
7.000000
11.000000

With OpenBLAS latest code:

Result vector y:
nan
nan
11.000000

The syrk interface is likely a general issue as well. For these two interfaces, when beta=0.0, the output needs to be all zeros.

@XiWeiGu
Copy link
Contributor Author

XiWeiGu commented Jul 12, 2024

One solution is to check if beta is 0.0 before calling SCAL_K in gemv and syrk and implement a bypass if it is. (Should we only need to check for gemv and syrk? Actually, there are other interfaces that call SCAL_K. Should we investigate if their expected results when beta=0.0 are the same?)

@martin-frbg
Copy link
Collaborator

the one remaining error with your fix is due to a small oversight in your revision of dscal.c - in the loop on "n1" the code should call dscal_kernel_8_zero when da=0&&dummy2=0

@XiWeiGu
Copy link
Contributor Author

XiWeiGu commented Jul 12, 2024

the one remaining error with your fix is due to a small oversight in your revision of dscal.c - in the loop on "n1" the code should call dscal_kernel_8_zero when da=0&&dummy2=0

Thank you, I've updated the code.

Copy link

codspeed-hq bot commented Jul 12, 2024

CodSpeed Performance Report

Merging #4794 will degrade performances by 10.97%

Comparing XiWeiGu:Fixed_Numpy_CI_Test (34b80ce) with develop (e9f6aa4)

Summary

❌ 1 regressions
✅ 61 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark develop XiWeiGu:Fixed_Numpy_CI_Test Change
test_gesdd[mn1-s] 56.8 ms 63.8 ms -10.97%

@XiWeiGu
Copy link
Contributor Author

XiWeiGu commented Jul 12, 2024

It seems that the performance of gesadd has declined; further adjustments are needed

@martin-frbg
Copy link
Collaborator

gesdd has scal on its call graph https://netlib.org/lapack/explore-html-3.6.1/d4/dca/group__real_g_esing_gaf60b27e77bfbeffe1ec63e0f360c4564_gaf60b27e77bfbeffe1ec63e0f360c4564_cgraph_org.svg but there is also some codspeed-specific problem with this particular benchmark (#4776). Maybe your changes happened to fix the premature exit (at least until the assembly kernels are updated) and the "regression" is actually the benchmark running to completion instead of erroring out early.

@martin-frbg
Copy link
Collaborator

I have fixes for x86_64 lined up (still need to test on Windows to check I read the correct argument there). will try to do arm64, ppc and riscv today

@XiWeiGu
Copy link
Contributor Author

XiWeiGu commented Jul 15, 2024

I have fixes for x86_64 lined up (still need to test on Windows to check I read the correct argument there). will try to do arm64, ppc and riscv today

Okay, I can fix mips/mips64. I also plan to add unit tests for s/dgemv to ensure it can follow the correct branch of the SCAL_K interface.

@mattip
Copy link
Contributor

mattip commented Jul 26, 2024

Trying to use latest HEAD over at openblas-libs, I am seeing a failure in i686

 TEST 77/107 sscal:0_inf_inc_2 [FAIL]
   ERR: test_zscal.c:77  should be true
 TEST 78/107 sscal:0_inf [FAIL]
   ERR: test_zscal.c:65  should be true
 TEST 79/107 sscal:nan_0_inc_2 [OK]
 TEST 80/107 sscal:nan_0 [OK]
 TEST 81/107 sscal:0_nan_inc_2 [FAIL]
   ERR: test_zscal.c:31  should be true
 TEST 82/107 sscal:0_nan [FAIL]
   ERR: test_zscal.c:19  should be true

This is using

CFLAGS=' -fvisibility=protected -Wno-uninitialized'
make BUFFERSIZE=20 DYNAMIC_ARCH=1 USE_OPENMP=0 NUM_THREADS=64 BINARY=32 \
   OBJCONV=/io/objconv/objconv SYMBOLPREFIX=scipy_ LIBNAMEPREFIX=scipy_ \
   FIXED_LIBNAME=1 TARGET=PRESCOTT

inside a quay.io/pypa/manylinux2014_i686 docker image on a linux x86_64 host.

@martin-frbg
Copy link
Collaborator

hopefully fixed by #4817 - I seem to have gotten the stack offset to the flag value wrong in the single-precision case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants