-
Notifications
You must be signed in to change notification settings - Fork 18
Conversation
|
Looks like the Conda test for Python 3.10 on ARM had a segfault on CI. Snippet of the error below: Detailstest_pynvjitlink_api.py::test_add_fatbin_with_cubin_error PASSED [ 91%]
Fatal Python error: Segmentation fault
Current thread 0x0000ffff86c547e0 (most recent call first):
File "/opt/conda/envs/test/lib/python3.10/site-packages/pynvjitlink/api.py", line 53 in add_data
File "/opt/conda/envs/test/lib/python3.10/site-packages/pynvjitlink/api.py", line 77 in add_fatbin
File "/__w/pynvjitlink/pynvjitlink/pynvjitlink/tests/test_pynvjitlink_api.py", line 92 in test_duplicate_symbols_cubin_and_fatbin
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/python.py", line 1627 in runtest
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/runner.py", line 242 in <lambda>
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/runner.py", line 341 in from_call
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/runner.py", line 241 in call_and_report
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/runner.py", line 132 in runtestprotocol
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/main.py", line 337 in _main
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/main.py", line 283 in wrap_session
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/test/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/config/__init__.py", line 175 in main
File "/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/config/__init__.py", line 201 in console_main
File "/opt/conda/envs/test/lib/python3.10/site-packages/pytest/__main__.py", line 9 in <module>
File "/opt/conda/envs/test/lib/python3.10/runpy.py", line 86 in _run_code
File "/opt/conda/envs/test/lib/python3.10/runpy.py", line 196 in _run_module_as_main
Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, numba.core.typeconv._typeconv, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, numba.mviewbuf, numba.core.typing.cmathdecl.cmath, pynvjitlink._nvjitlinklib, numba.types.itertools, numba.cpython.numbers.math, numba.cpython.hashing.math, numba.cpython.hashing.sys, numba.cpython.mathimpl.math, numba.cpython.mathimpl.sys (total: 29)
ci/test_conda.sh: line 58: 1479 Segmentation fault (core dumped) python -m pytest --cache-clear --junitxml="${RAPIDS_TESTS_DIR}/junit-pynvjitlink.xml" -v
test_pynvjitlink_api.py::test_duplicate_symbols_cubin_and_fatbin /__w/pynvjitlink/pynvjitlinkEdit: Seeing the same error in the Conda test for Python 3.13 on |
|
Also seeing the following error in the wheel builds on CI: DetailsHowever am able to download these URLs locally Think we need a retry workflow for |
|
Trying rerunning to see if the wheel CI failures clear out |
|
I suspect libnvjitlink 12.9 is not handling errors as gracefully as previous versions have done (there have been similar problems in the past). |
|
There is some misbehaviour inside nvjitlink when an error occurs in |
With CUDA 12.9 this leads to invalid reads within nvjitlink.
|
Skipping the test in question removes the invalid reads - I've pushed a commit with the skip that I think should enable us to move forward (the test was previously xfailed anyway). |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm comfortable moving forward with this, and dealing with the test issue separately / later (I see no practical resolution within a reasonable timeframe for a CUDA 12.9-based release of pynvjitlink).
- Update to CUDA 12.9 (rapidsai#138) - feat(conda): port conda recipe to rattler-build (rapidsai#137) - Download build artifacts from Github for CI (rapidsai#136) - Moving wheel builds to specified location and uploading build artifacts to Github (rapidsai#135) - Use mainline shared-workflows again (rapidsai#134)
- Update to CUDA 12.9 (#138) - feat(conda): port conda recipe to rattler-build (#137) - Download build artifacts from Github for CI (#136) - Moving wheel builds to specified location and uploading build artifacts to Github (#135) - Use mainline shared-workflows again (#134) <!-- Thank you for contributing to pynvjitlink :) Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present) and replace it with `[REVIEW]`. If assistance is required to complete the functionality, for example when the C/C++ code of a feature is complete but Python bindings are still required, then add the label `[HELP-REQ]` so that others can triage and assist. The additional changes then can be implemented on top of the same PR. If the assistance is done by members of the rapidsAI team, then no additional actions are required by the creator of the original PR for this, otherwise the original author of the PR needs to give permission to the person(s) assisting to commit to their personal fork of the project. If that doesn't happen then a new PR based on the code of the original PR can be opened by the person assisting, which then will be the PR that will be merged. 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please do not rebase your branch on main/force push/rewrite history, doing any of these causes the context of any comments made by reviewers to be lost. If conflicts occur against main they should be resolved by merging main into the branch used for making the pull request. Many thanks in advance for your cooperation! -->
|
Thanks Graham and Bradley! 🙏 |
Now that CUDA 12.9 is out. Update
pynvjitlinkto CUDA 12.9.Part of issue: rapidsai/build-planning#173