Implement alignment support for local and shared arrays. #143

tpn · 2025-03-03T03:05:38Z

This PR adds support for specifying an alignment=N keyword argument to the cuda.local.array() and cuda.shared.array() helpers that can be used within JIT'd CUDA kernels (i.e. functions annotated with @numba.cuda.jit.

tpn · 2025-03-03T21:56:25Z

I've removed the dependency on altering the underlying types.Array in numba, it wasn't necessary, as @gmarkall pointed out.

numba_cuda/numba/cuda/cudaimpl.py

gmarkall

Thanks for the PR! I think this is a good start with some functionality working as expected - however there are some other cases to cover and a few observations on the diff. We might need another iteration afterwards once things have shaped up a bit (and the docs might need an update if they didn't get generated from the source, I will have to check).

numba_cuda/numba/cuda/cudadecl.py

numba_cuda/numba/cuda/cudaimpl.py

numba_cuda/numba/cuda/stubs.py

numba_cuda/numba/cuda/tests/cudapy/test_array_alignment.py

numba_cuda/numba/cuda/cudaimpl.py

gmarkall

Thanks for the fixups! I think there are a couple of changes that are needed:

The ptx_lmem_alloc_array() function needs deleting now as it is duplicated
I think there's still not a test of passing an invalid type for the alignment (e.g. 1.0, or "1") - it would be good to check we correctly error out rather than silently doing the wrong thing.

The other comments are thoughts / informational.

tpn · 2025-03-24T20:56:58Z

@gmarkall added some invalid type alignment tests, as well as tweaking the tests to use a common set of DTYPES that also include a bunch of record types (with and without alignment).

numba_cuda/numba/cuda/cudaimpl.py

tpn · 2025-04-29T18:01:29Z

Hi @gmarkall, I believe this one is ready to go with all requested changes made.

ZzEeKkAa · 2025-04-30T21:25:43Z

These changes conflicts with #176 , but these changes are more important for nvmath-python. I'll fix my branch after this one is merged.

numba_cuda/numba/cuda/tests/cudapy/test_array_alignment.py

gmarkall

Thanks for the updates - there are a couple of minor questions on the added test - I don't think they necessarily need to be addressed for this to be merged, so just let me know what you want to do (i.e. merge as-is or modify based on the comments).

Remove erroneous alignment= kwarg to types.Array(). Cosmetic: fix docstring typo. PR Feedback: Clarify pointer size. We don't support 32-bit x86. PR Feedback: Improve alignment handling in Cuda_array_decl. Co-authored-by: Graham Markall <gmarkall@nvidia.com> PR Feedback: Improve alignment handling in cudaimpl. - Reduce fiddly boilerplate code in each array helper routine with a single call to `_try_extract_and_validate_alignment`. - Simplify the decorators using `types.BaseTuple` where possible. Add missing `cuda_local_array_tuple` implementation. This ensures multi-dim shapes can be handled by `cuda.local.array`. PR Feedback: Improve comment. Co-authored-by: Graham Markall <gmarkall@nvidia.com> PR Feedback: Improve alignment tests. - Verify the align attribute in the LLVM IR. - Add multi-dimensional tests. - Remove dead code. PR Feedback: Remove ptx_lmem_alloc_array. This functionality is now provided by cuda_local_array_tuple, whose name better fits with the other three cuda_(local|shared)_array_(tuple|integer) routines. COSMETIC: Relocate `test_invalid_alignments()`. No code changes are in this commit. I'm relocating the function in anticipation of some refactoring in the next commit. It makes sense to have the `_do_test()` implementation come immediately after the three test functions that use it (`test_array_alignment_[123]d()`). PR Feedback: Add tests for invalid alignment types. Add some record dtypes to the alignment tests. PR Feedback: Improve _try_extract_and_validate_alignment() docstring. Fix test failures on CI. CI was showing error messages containing ANSI color codes, e.g.: RequireLiteralValue: \x1b[1malignment must be a constant integer\x1b[0m\x1b[0m\n

tpn · 2025-05-05T22:34:37Z

Thanks for the updates - there are a couple of minor questions on the added test - I don't think they necessarily need to be addressed for this to be merged, so just let me know what you want to do (i.e. merge as-is or modify based on the comments).

Fixes were easy, all good on my side for the merge!

tpn force-pushed the 140-array-alignment branch from 721f88c to 12ea962 Compare March 3, 2025 03:09

This was referenced Mar 3, 2025

Implement support for array alignment numba/numba#9942

Closed

[FEA] Add support for alignment to cuda array helpers #140

Closed

tpn force-pushed the 140-array-alignment branch 3 times, most recently from 24a02ef to b44dc91 Compare March 3, 2025 21:55

tpn mentioned this pull request Mar 3, 2025

Add support for alignment to cuda array helpers NVIDIA/cccl#3922

Closed

vyasr mentioned this pull request Mar 3, 2025

Upload wheels to PyPI from GitHub-hosted runner #142

Merged

gmarkall reviewed Mar 4, 2025

View reviewed changes

numba_cuda/numba/cuda/cudaimpl.py Outdated Show resolved Hide resolved

gmarkall added the 2 - In Progress Currently a work in progress label Mar 4, 2025

tpn added this to CCCL Mar 5, 2025

github-project-automation bot moved this to Todo in CCCL Mar 5, 2025

tpn force-pushed the 140-array-alignment branch from b44dc91 to 84d10c0 Compare March 6, 2025 19:55

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 2 - In Progress Currently a work in progress labels Mar 7, 2025

tpn force-pushed the 140-array-alignment branch from 84d10c0 to 96e0d81 Compare March 9, 2025 23:26

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Mar 10, 2025

tpn commented Mar 17, 2025

View reviewed changes

numba_cuda/numba/cuda/cudaimpl.py Outdated Show resolved Hide resolved

gmarkall requested changes Mar 18, 2025

View reviewed changes

github-project-automation bot moved this from Todo to In Progress in CCCL Mar 18, 2025

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Mar 18, 2025

tpn force-pushed the 140-array-alignment branch from 5703a87 to 833a9ce Compare March 21, 2025 17:28

gmarkall reviewed Mar 24, 2025

View reviewed changes

numba_cuda/numba/cuda/cudaimpl.py Show resolved Hide resolved

gmarkall reviewed Mar 24, 2025

View reviewed changes

numba_cuda/numba/cuda/cudaimpl.py Show resolved Hide resolved

gmarkall requested changes Mar 24, 2025

View reviewed changes

gmarkall reviewed Mar 24, 2025

View reviewed changes

numba_cuda/numba/cuda/cudaimpl.py Outdated Show resolved Hide resolved

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Mar 24, 2025

tpn force-pushed the 140-array-alignment branch from 8fd0b1a to c33931d Compare April 1, 2025 22:26

tpn requested a review from gmarkall April 1, 2025 22:26

tpn force-pushed the 140-array-alignment branch from c33931d to fa9ecb9 Compare April 2, 2025 23:30

tpn force-pushed the 140-array-alignment branch 2 times, most recently from 384a505 to 9eaca8b Compare April 29, 2025 00:15

gmarkall reviewed May 1, 2025

View reviewed changes

numba_cuda/numba/cuda/tests/cudapy/test_array_alignment.py Show resolved Hide resolved

gmarkall reviewed May 1, 2025

View reviewed changes

numba_cuda/numba/cuda/tests/cudapy/test_array_alignment.py Outdated Show resolved Hide resolved

gmarkall reviewed May 1, 2025

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels May 1, 2025

tpn added 2 commits May 5, 2025 14:59

PR Feedback: use two kernels for invalid test; use assertRaisesRegex.

faebec3

tpn force-pushed the 140-array-alignment branch from 9eaca8b to faebec3 Compare May 5, 2025 22:33

gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on author Waiting for author to respond to review labels May 6, 2025

gmarkall approved these changes May 6, 2025

View reviewed changes

gmarkall merged commit 2653de2 into NVIDIA:main May 6, 2025
37 checks passed

github-project-automation bot moved this from In Progress to Done in CCCL May 6, 2025

tpn deleted the 140-array-alignment branch May 6, 2025 20:43

brandon-b-miller mentioned this pull request May 9, 2025

Bump version to 0.11.0 #250

Merged

tpn mentioned this pull request May 16, 2025

Add temp storage alignment awareness NVIDIA/cccl#4726

Closed

tpn mentioned this pull request May 27, 2025

[BUG]: Design a fix for temporary storage alignment in cuda.cooperative module NVIDIA/cccl#2558

Closed

1 task

Implement alignment support for local and shared arrays. #143

Implement alignment support for local and shared arrays. #143

Uh oh!

Conversation

tpn commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpn commented Mar 3, 2025

Uh oh!

Uh oh!

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

tpn commented Mar 24, 2025

Uh oh!

Uh oh!

tpn commented Apr 29, 2025

Uh oh!

ZzEeKkAa commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

tpn commented May 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tpn commented Mar 3, 2025 •

edited

Loading

ZzEeKkAa commented Apr 30, 2025 •

edited

Loading