Skip to content

TEST: 1.21.x + blas variants#237

Closed
h-vetinari wants to merge 4 commits into
conda-forge:numpy121from
h-vetinari:1.21_blas_vars
Closed

TEST: 1.21.x + blas variants#237
h-vetinari wants to merge 4 commits into
conda-forge:numpy121from
h-vetinari:1.21_blas_vars

Conversation

@h-vetinari

Copy link
Copy Markdown
Member

Continuing the analysis from #227 & #196. Should not be merged for the same reasons as #227.

@conda-forge-linter

Copy link
Copy Markdown

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari

Copy link
Copy Markdown
Member Author

Update for 1.21.0

From 1 failure out of 64 for 1.20.3, there are now 4 (mostly flaky) failures.

Note: travis seems to be hanging across the board, but should pass once restarted.. Same expectation for aarch, which is still stuck in a long queue. Will update this comment if necessary.

The badnews:

  • win+blis remains flaky
  • broken pipes reappeared on win

Details

lib before after updated
numpy 1.20.3 1.21.0 X
libblas 3.9.0-9 3.9.0-9
blis 0.8.1-0 0.8.1-0
openblas 0.3.15-pthreads-1 0.3.15-pthreads-1
mkl 2021.2-389 2021.2-389
netlib 3.9.0-5 3.9.0-5
pypy 7.3.4-4 7.3.4-4

variant before after
win + blis 12 failures for py37-only 12 failures for py39-only
win Passed Reappearing failures due to The process tried to write to a nonexistent pipe.

variant blis mkl netlib openblas sum*
linux / x86 ✔️ ✔️ ✔️ ✔️ -
linux / aarch ✔️ ✔️ -
linux / ppc64le ✔️ ✔️ -
osx / arm ✔️ ✔️ -
osx / x86 ✔️ ✔️ ✔️ ✔️ -
win / x86 ✔️ / ❌ ✔️ / ❌ ✔️ / ❌ ✔️ / ❌ 4F
sum* 1F 1F 1F 1F 4F

* sum of Failures (out of a total of 64 CI combinations being tested)

Build logs:
Azure
Drone
Travis

win + blis + cpython 3.9: 12 failures
=========================== short test summary info ===========================
FAILED core/tests/test_multiarray.py::TestMatmul::test_dot_equivalent[args4]
FAILED core/tests/test_multiarray.py::TestMatmul::test_matmul_object - Assert...
FAILED linalg/tests/test_linalg.py::TestSolve::test_sq_cases - AssertionError...
FAILED linalg/tests/test_linalg.py::TestSolve::test_generalized_sq_cases - As...
FAILED linalg/tests/test_linalg.py::TestInv::test_sq_cases - AssertionError: ...
FAILED linalg/tests/test_linalg.py::TestInv::test_generalized_sq_cases - Asse...
FAILED linalg/tests/test_linalg.py::TestPinv::test_generalized_sq_cases - Ass...
FAILED linalg/tests/test_linalg.py::TestPinv::test_generalized_nonsq_cases - ...
FAILED linalg/tests/test_linalg.py::TestDet::test_sq_cases - AssertionError: ...
FAILED linalg/tests/test_linalg.py::TestDet::test_generalized_sq_cases - Asse...
FAILED linalg/tests/test_linalg.py::TestMatrixPower::test_power_is_minus_one[dt13]
FAILED linalg/tests/test_linalg.py::TestCholesky::test_basic_property - Asser...
= 12 failed, 16017 passed, 354 skipped, 20 xfailed, 1 xpassed, 229 warnings in 622.85s (0:10:22) =

@h-vetinari

Copy link
Copy Markdown
Member Author

Finally opened an issue for blis: flame/blis#514

@h-vetinari

Copy link
Copy Markdown
Member Author

So I had to restart the CI because travis died (and I don't have restart rights). This now lead to the reappearance of numpy/numpy#19192, though exclusively for PyPy.

@mattip @r-devulap, is it possible that something is messing with the glibc-detection used in numpy/numpy#19209 on PyPy? Also, I really don't understand why this passed an hour before (with the exact same commit).

@mattip

mattip commented Jun 30, 2021

Copy link
Copy Markdown

I really don't understand why this passed an hour before

It may be run on different machines, some with AVX512 some without

@mattip

mattip commented Jun 30, 2021

Copy link
Copy Markdown

It seems NumPy master has extended numpy.show_config to show which CPU features are detected. It would be nice if we could see that here, it would allow reasoning about runs on different CI machines

Comment thread recipe/meta.yaml Outdated
@h-vetinari

Copy link
Copy Markdown
Member Author

So I had to restart the CI because travis died (and I don't have restart rights). This now lead to the reappearance of numpy/numpy#19192, though exclusively for PyPy.

@mattip @r-devulap, is it possible that something is messing with the glibc-detection used in numpy/numpy#19209 on PyPy? Also, I really don't understand why this passed an hour before (with the exact same commit).

After following @mattip's tip for investigating the SIMD capabilities of the agents again, we're basically reconfirming numpy/numpy#19192 (failing runs have AVX512F? AVX512CD? AVX512_SKX?, passing runs have AVX512F* AVX512CD* AVX512_SKX*), but that just shows that the glibc-version-skip introduced in numpy/numpy#19209 does not properly work on PyPy for some reason.

@h-vetinari

Copy link
Copy Markdown
Member Author

Finally opened an issue for blis: flame/blis#514

However, the SIMD check did help to verify that the blis failures are not actually flaky, but happen in the presence of AVX512.

@h-vetinari

Copy link
Copy Markdown
Member Author

[...] that just shows that the glibc-version-skip introduced in numpy/numpy#19209 does not properly work on PyPy for some reason.

@mattip, is ver = os.confstr('CS_GNU_LIBC_VERSION').rsplit(' ')[1] supposed to work on PyPy? Or perhaps, what would be the correct way to pick up the system glibc version?

@r-devulap

Copy link
Copy Markdown

Hi @h-vetinari looks like ver = os.confstr('CS_GNU_LIBC_VERSION').rsplit(' ')[1] doesn't work on pypy3. It throws an ValueError: unrecognized configuration name. This explains why you are seeing the error again. I am not sure how to make it work on pypy3, still looking into it..

@h-vetinari

Copy link
Copy Markdown
Member Author

This explains why you are seeing the error again. I am not sure how to make it work on pypy3, still looking into it..

Thanks a lot for investigating!

@mattip

mattip commented Jul 1, 2021

Copy link
Copy Markdown

PyPy does not implement that value. I opened an issue in PyPy and a corresponding one in NumPy numpy/numpy#19385. Note that packaging.tags uses a ctypes workaround, but that seems like overkill for a problem that PyPy should solve.

@h-vetinari

Copy link
Copy Markdown
Member Author

PyPy does not implement that value. I opened an issue in PyPy and a corresponding one in NumPy numpy/numpy#19385.

Thanks! :)

@mattip

mattip commented Jul 1, 2021

Copy link
Copy Markdown

PyPy issue is fixed. Is it worth making a patch and releasing a new pypy7.3.5 build? The next PyPy release will probably be a few months coming, and I don't know how common it is to use PyPy + centos7

@h-vetinari

Copy link
Copy Markdown
Member Author

PyPy issue is fixed. Is it worth making a patch and releasing a new pypy7.3.5 build?

I think that would be worthwhile, carrying a patch is not a big deal IMO.

The next PyPy release will probably be a few months coming, and I don't know how common it is to use PyPy + centos7

Maybe I misunderstand, but CentOS 6/7 are just stand-ins for linux here, where PyPy usage is highest.

@mattip

mattip commented Jul 1, 2021

Copy link
Copy Markdown

I mis-stated the failing combination above. It is PyPy + glibc2.12, which was found on centos6, not centos7. Centos6 is EOL since Nov 2020. But for some reason the conda environment uses it.

@h-vetinari

Copy link
Copy Markdown
Member Author

Centos6 is EOL since Nov 2020. But for some reason the conda environment uses it.

See here: conda-forge/conda-forge.github.io#1436

@isuruf

isuruf commented Jul 1, 2021

Copy link
Copy Markdown
Member

But for some reason the conda environment uses it.

It's the same reason that numpy supports manylinux2010 (which is glibc 2.12). 😉

@mattip

mattip commented Jul 2, 2021

Copy link
Copy Markdown

NumPy uses manylinux2010 not to support the outdated CentOS6, but because it still supports older linux versions that may not have pip v20. I am not sure conda has the same problem.

@h-vetinari

Copy link
Copy Markdown
Member Author

I am not sure conda has the same problem.

A very similar one - once conda moves off of CentOS 6, the packages built for linux are not usable on older distros anymore. As can be seen from the issue I linked, a move away from this is on the horizon, but that wasn't realistic or desirable until quite recently.

@rgommers

Copy link
Copy Markdown
Contributor

very nice!

@h-vetinari

h-vetinari commented Dec 28, 2021

Copy link
Copy Markdown
Member Author

Update for numpy 1.21.5: all green except PPC (as before)

Due to the missing sys.exit wrapper for numpy.test, we were missing some error reporting. In particular, the PPC builds were all failing before already, so with that in mind: After 8 failures (PPC-only) out of 68 runs for 1.21.3, we are now at 10 failures (PPC-only) out of 86 runs (added python 3.10 everywhere).

Notable

Details

lib before after updated
version
updated
build
numpy 1.21.3 1.21.5 X
libblas 3.9.0-12 3.9.0-12
blis 0.8.1-1 0.8.1-1
openblas 0.3.18-pthreads-0 0.3.18-pthreads-0
mkl 2021.4.0-729 2021.4.0-729
netlib 3.9.0-5 3.9.0-5
pypy 7.3.5-9 7.3.7-3 X

variant blis mkl netlib openblas sum*
linux / x86 ✔️ ✔️ ✔️ ✔️ -
linux / aarch ✔️ ✔️ -
linux / ppc64le ✖️ ✖️ 10F
osx / arm ✔️ ✔️ -
osx / x86 ✔️ ✔️ ✔️ ✔️ -
win / x86 ✔️ ✔️ ✔️ ✔️ -
sum* - - 5F 5F 10F

* sum of Failures (out of a total of 86 CI combinations being tested)

Build logs:
Azure

@mattip

mattip commented Dec 28, 2021

Copy link
Copy Markdown

Nice. A heads-up that there is apparently a problem with the recently released OpenBLAS 0.3.19 and NumPy: see numpy/numpy#20660

@h-vetinari

Copy link
Copy Markdown
Member Author

Not expecting a 1.21.6 release, so closing this.

@h-vetinari h-vetinari closed this Feb 5, 2022
@h-vetinari h-vetinari reopened this Apr 12, 2022
@h-vetinari h-vetinari changed the base branch from master to numpy121 April 12, 2022 22:24
@h-vetinari

Copy link
Copy Markdown
Member Author

Update for 1.21.6 (+ new PyPy builds and BLAS updates): all green except PPC (as before)

Turns out I guessed wrong about:

Not expecting a 1.21.6 release, so closing this.

Also, due to the rebuilds for pypy3.8/3.9, much less several relevant BLAS (& infrastructure) changes, it makes sense to do an update here.

From 10 failures (PPC-only) out of 86 runs, we're now at 12 failures (PPC-only) out of 108 runs.

Notable

  • Added accelerate BLAS flavour on osx
  • Testing against PyPy 3.8 and 3.9 added everywhere but for osx-arm
  • Version bumps for openblas, blis & MKL
  • Switched to running the full test suite; emulation keeps running only label='fast' tests.

Details

variant before after
linux + ppc test failures due to emulation problems as before

lib before after updated
version
updated
build
numpy 1.21.5 1.21.6 X
libblas 3.9.0-12 3.9.0-14 X
blis 0.8.1-1 0.9.0-0 X
openblas 0.3.18-pthreads-1 0.3.20-pthreads-0 X
mkl 2021.4.0-729 2022.0.1-803 X
netlib 3.9.0-5 3.9.0-5
pypy 7.3.7-3 7.3.9-1 X
qemu-user-static ? 6.1.0-8

variant accelerate blis mkl netlib openblas sum*
linux / x86 ✔️ ✔️ ✔️ ✔️ -
linux / aarch ✔️ ✔️ -
linux / ppc64le ✖️ ✖️ 12F
osx / arm ✔️ ✔️ ✔️ -
osx / x86 ✔️ ✔️ ✔️ ✔️ ✔️ -
win / x86 ✔️ ✔️ ✔️ ✔️ -
sum* - - - 6F 6F 12F

* sum of Failures (out of a total of 108 CI combinations being tested)

Build logs:
Azure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants