-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bench_blas.py::test_gesdd
in Codspeed CI job
#4776
Comments
Hm... Actually, it did fail like this before: https://github.com/OpenMathLib/OpenBLAS/actions/workflows/codspeed-bench.yml |
A few quick observations:
The benchmark is using OPENBLAS_NUM_TREADS=1, so it's not thread safety but it's still deep in OpenBLAS.
All in all, this looks like a possibly genuine edge case in OpenBLAS (or reference LAPACK? not sure where the single-precision gesdd kernel comes from) I'll try to smoke it out on CI without codspeed now. Meanwhile, maybe we should just remove the assertion for the time being. The assertion is not required for the benchmark itself; it's just generally nicer to benchmark correctly working code, they say. EDIT: #4777 |
Weird error, INFO=4 would mean this input argument (the denominator of the scale factor) is either NAN or zero |
There is one call graph where this input argument is provided by SNRM2... |
This is indeed codspeed-specific: https://github.com/OpenMathLib/OpenBLAS/actions/runs/9765188405/job/26955309472?pr=4777 I asked on their discord (https://discord.com/channels/1065233827569598464/1065686090452828251/threads/1257753281342738502 -- might need to join to see, no idea; will repost any insights here anyway). Meanwhile, do we want to disable the assertion so that the runs are uploaded and we can at least see the flamegraphs? |
Guess it would make sense to disable the assertion, especially if the problem cannot be reproduced outside codspeed. (This is unlikely to be cpu-specific as there is only one assembly kernel for SNRM2 in use on all x86_64 targets (except the plain C "GENERIC" one). Or if it is, it would have to be due to compiler/assembler misbehaviour) |
In OpenMathLib#4776 we're hitting ** On entry to SLASCL parameter number 4 had an illegal value on codspeed, but not outside (either locally or on github runners)
okay, I repurposed #4777 to ignore the sgesdd failure. |
In OpenMathLib#4776 we're hitting ** On entry to SLASCL parameter number 4 had an illegal value on codspeed, but not outside (either locally or on github runners)
The plot thickens ... this is almost certainly fallout from #4728 (where I fixed SSCAL to return NAN in certain corner cases like |
One check that the problem's fixed is to re-enable the $ git diff
diff --git a/benchmark/pybench/benchmarks/bench_blas.py b/benchmark/pybench/benchmarks/bench_blas.py
index 628c0cb2a..8127dd0c7 100644
--- a/benchmark/pybench/benchmarks/bench_blas.py
+++ b/benchmark/pybench/benchmarks/bench_blas.py
@@ -234,14 +234,10 @@ def test_gesdd(benchmark, mn, variant):
gesdd = ow.get_func('gesdd', variant)
u, s, vt, info = benchmark(run_gesdd, a, lwork, gesdd)
- if variant != 's':
- # On entry to SLASCL parameter number 4 had an illegal value
- # under codspeed (cannot repro locally or on CI w/o codspeed)
- # https://github.com/OpenMathLib/OpenBLAS/issues/4776
- assert info == 0
-
- atol = {'s': 1e-5, 'd': 1e-13}
- np.testing.assert_allclose(u @ np.diag(s) @ vt, a, atol=atol[variant])
+ assert info == 0
+
+ atol = {'s': 1e-5, 'd': 1e-13}
+ np.testing.assert_allclose(u @ np.diag(s) @ vt, a, atol=atol[variant])
# linalg.eigh cc @art049 of codspeed fame, in case you've time to isolate the problem under codspeed, as discussed elsewhere. Would like to thank you publicly for building codspeed regardless |
This failure just showed up in CI, from this log:
This benchmark was introduced in gh-4678 one and a half months ago. I'm not sure it failed before like this. @ev-br you may want to look into this?
The text was updated successfully, but these errors were encountered: