Skip to content

Conversation

@bdice
Copy link
Contributor

@bdice bdice commented Mar 6, 2025

This PR adds Python 3.13 to CI, and updates the workflows to align with updates from RAPIDS' shared-workflows upstream.

@bdice
Copy link
Contributor Author

bdice commented Mar 7, 2025

@gmarkall It'd be great to have your eyes on some of these failures. Are any of them expected?

@gmarkall
Copy link
Contributor

gmarkall commented Mar 7, 2025

@bdice Thanks for the PR! These are not expected, but they all seem to have a common cause - I'll take a look.

@gmarkall gmarkall added the 4 - Waiting on reviewer Waiting for reviewer to respond to author label Mar 7, 2025
@bdice bdice changed the title Drop Python 3.9, add Python 3.13, and update shared-workflows. Add Python 3.13, and update shared-workflows. Mar 7, 2025
@bdice
Copy link
Contributor Author

bdice commented Mar 7, 2025

We will need rapidsai/pynvjitlink#131 and a pynvjitlink release to unblock Python 3.13 tests. That PR is ready and just needs a final review, so I think a new release might be able to ship today. If the other segfault issues on this PR are worked out before that release, we can temporarily roll those jobs back to Python 3.12.

@bdice
Copy link
Contributor Author

bdice commented Mar 7, 2025

I fixed up the CI matrix and made sure we have supported image tags across the matrix.

I think there are only two root causes of the CI failures to address.

The conda CUDA 11.4 job is showing:

======================================================================
FAIL: test_linking_cu_error (numba.cuda.tests.cudadrv.test_linker.TestLinker)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/envs/test/lib/python3.9/site-packages/numba_cuda/numba/cuda/tests/cudadrv/test_linker.py", line 197, in test_linking_cu_error
    self.assertIn('NVRTC Compilation failure', msg)
AssertionError: 'NVRTC Compilation failure' not found in 'Failed to call nvrtcCompileProgram: NVRTC_ERROR_INVALID_OPTION'

----------------------------------------------------------------------

The CUDA 12.8 / Python 3.13 jobs (for conda and wheels and pynvjitlink) are showing a segfault in test_get_const_mem_specialized:

Current thread 0x00007f7ba498e740 (most recent call first):
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 326 in safe_cuda_api_call
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 2901 in add_ptx
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba_cuda/numba/cuda/codegen.py", line 189 in _link_all
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba_cuda/numba/cuda/codegen.py", line 227 in get_cubin
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba_cuda/numba/cuda/codegen.py", line 248 in get_cufunc
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba_cuda/numba/cuda/dispatcher.py", line 268 in bind
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba_cuda/numba/cuda/dispatcher.py", line 1020 in compile
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba_cuda/numba/cuda/decorators.py", line 134 in _jit
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba_cuda/numba/cuda/tests/cudapy/test_dispatcher.py", line 556 in test_get_const_mem_specialized
  File "/opt/conda/envs/test/lib/python3.13/unittest/case.py", line 606 in _callTestMethod
  File "/opt/conda/envs/test/lib/python3.13/unittest/case.py", line 651 in run
  File "/opt/conda/envs/test/lib/python3.13/unittest/case.py", line 707 in __call__
  File "/opt/conda/envs/test/lib/python3.13/unittest/suite.py", line 122 in run
  File "/opt/conda/envs/test/lib/python3.13/unittest/suite.py", line 84 in __call__
  File "/opt/conda/envs/test/lib/python3.13/unittest/runner.py", line 240 in run
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba/testing/main.py", line 169 in run
  File "/opt/conda/envs/test/lib/python3.13/unittest/main.py", line 270 in runTests
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba/testing/main.py", line 361 in run_tests_real
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba/testing/main.py", line 376 in runTests
  File "/opt/conda/envs/test/lib/python3.13/unittest/main.py", line 104 in __init__
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba/testing/main.py", line 204 in __init__
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba/testing/__init__.py", line 54 in run_tests
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba/testing/_runtests.py", line 25 in _main
  File "/opt/conda/envs/test/lib/python3.13/site-packages/numba/runtests.py", line 9 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, numba.core.typeconv._typeconv, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, numba.mviewbuf, numba.core.typing.cmathdecl.cmath, numba.types.itertools, _cffi_backend, numba.cpython.mathimpl.math, numba.cpython.mathimpl.sys, numba.cpython.numbers.math, numba.cpython.hashing.math, numba.cpython.hashing.sys, numba.np.arraymath.math, numba.core.typing.mathdecl.math (total: 31)
ci/test_conda.sh: line 52:   972 Segmentation fault      (core dumped) python -m numba.runtests numba.cuda.tests -v

@bdice
Copy link
Contributor Author

bdice commented Mar 7, 2025

The failing tests are net-new coverage in the test suite. I would like to propose breaking this down, to make solving the issues easier and more atomic.

  1. Merge this PR with only the currently-succeeding jobs.
  2. Open a separate PR for CUDA 11.4 testing.
  3. Open a separate PR for Python 3.13 testing.

@bdice
Copy link
Contributor Author

bdice commented Mar 7, 2025

Update: the segfault still occurs with Python 3.12. Maybe the segfault wasn't due to Python 3.13, perhaps it was something with the OS (ubuntu24.04) or CUDA version (12.8.0). I will try another combination of matrix options to isolate what is causing this.

@bdice
Copy link
Contributor Author

bdice commented Mar 8, 2025

pynvjitlink 0.5.1 is released, with Python 3.13 support. There is one test failure remaining on the conda pynvjitlink job:

test_managed_alloc_driver_undersubscribe (numba.cuda.tests.cudadrv.test_managed_alloc.TestManagedAlloc.test_managed_alloc_driver_undersubscribe) ... ci/test_conda_pynvjitlink.sh: line 73:  1148 Killed                  NUMBA_CUDA_ENABLE_PYNVJITLINK=1 NUMBA_CUDA_TEST_BIN_DIR=$NUMBA_CUDA_TEST_BIN_DIR python -m numba.runtests numba.cuda.tests -v

I am trying to rerun the job but I suspect this will need further investigation.

@bdice
Copy link
Contributor Author

bdice commented Apr 1, 2025

I have narrowed the failures a bit.

Python 3.13 and Ubuntu 24.04 do not appear to have any issues, so my earlier commits were a bit misleading.

I recommend we merge this as-is, so that we are more aligned with the (wider) RAPIDS build/test matrix and have the actions updates that we need in this repo. Then we can work through the failures in follow-up PRs.

Copy link
Contributor

@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bdice , just to confirm my understanding, we're currently passing by not spawning jobs in the buggy configurations, right? If so, can we just make sure we have an issue documenting what those configurations are and what errors they produce?

@bdice
Copy link
Contributor Author

bdice commented Apr 1, 2025

@brandon-b-miller Yes -- that's exactly my plan. I have filed issues based on these findings. I will open PRs that demonstrate each failure separately after merging this.

@bdice bdice merged commit 1185852 into NVIDIA:main Apr 1, 2025
35 checks passed
jiel-nv pushed a commit to jiel-nv/numba-cuda that referenced this pull request Apr 10, 2025
* Drop Python 3.9, add Python 3.13, and update shared-workflows.

* Add back Python 3.9.

* Fix requires-python.

* Correct test matrix.

* Use supported CUDA 12.2.2 Ubuntu version.

* Fix lack of CUDA 11.4 images for citestwheel.

* Fix pynvjitlink support matrix.

* Temporarily disable Python 3.13 and CUDA 11.4 tests.

* Try CUDA 12.5 and Ubuntu 24.04 separately.

* Skip CUDA 12.8, but test Python 3.13 and Ubuntu 24.04.

* Test older Ubuntu version.

* Try with Python 3.11.

* Use l4 GPU.

* Try newer Python and OS again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 - Waiting on reviewer Waiting for reviewer to respond to author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants