Skip to content

[NFC] Vendor in Dispatcher as CUDADispatcher to allow for future CUDA-specific customization#338

Merged
gmarkall merged 3 commits intoNVIDIA:mainfrom
VijayKandiah:vk/cuda_dispatcher
Jul 29, 2025
Merged

[NFC] Vendor in Dispatcher as CUDADispatcher to allow for future CUDA-specific customization#338
gmarkall merged 3 commits intoNVIDIA:mainfrom
VijayKandiah:vk/cuda_dispatcher

Conversation

@VijayKandiah
Copy link
Contributor

This PR vendors in some necessary pieces of Dispatcher class into CUDADispatcher for future CUDA-specific changes, making the latter inherit directly from _DispatcherBase. This change also vendors in supporting classes like _MemoMixin and _FunctionCompiler for future customizations. This is an NFC, so no additional unit tests are needed.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@VijayKandiah
Copy link
Contributor Author

/ok to test 04efabd

@VijayKandiah VijayKandiah added the 2 - In Progress Currently a work in progress label Jul 22, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 24, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@VijayKandiah
Copy link
Contributor Author

/ok to test c288169

@gmarkall gmarkall added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jul 28, 2025


class CUDADispatcher(Dispatcher, serialize.ReduceMixin):
class _MemoMixin:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used / relevant to the CUDA target - if it is removed, does everything still appear to work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, don't worry about this comment too much - I think it will show up when we run coverage if it's not needed, so let's not think about it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Comment on lines 893 to 894
locals: dict, optional
Mapping of local variable names to Numba types. Used to override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no locals parameter.

functools.update_wrapper(self, py_func)

self.targetoptions = targetoptions
self.locals = locals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to figure out how this could possibly be working, then I realised this is assigning the built-in locals to self.locals. Although it doesn't seem to have disrupted the tests, this seems dangerous - can you remove the locals assignment please?

(locals is an option that's not supported by the CUDA target).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed locals from CUDADispatcher and _FunctionCompiler. We are now passing locals={} to compiler.compile_extra similar to how we invoke compiler.compile_extra in numba/cuda/compiler.py.

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some stuff I think is not needed in this PR, but I think it will be easier to have a large-scale cleanup following a run of the tests with coverage reporting, after other PRs have gone in too.

The only thing that really needs to change here is setting the CUDADispatcher's locals to the locals builtin function, which seems likely to have an unintended and hard-to-diagnose consequence.

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Jul 28, 2025
@VijayKandiah
Copy link
Contributor Author

/ok to test b02356d

@VijayKandiah VijayKandiah added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Jul 28, 2025
@VijayKandiah
Copy link
Contributor Author

/ok to test fbcd2d9

@gmarkall gmarkall merged commit 4a781e1 into NVIDIA:main Jul 29, 2025
39 checks passed
@VijayKandiah VijayKandiah self-assigned this Jul 29, 2025
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Jul 31, 2025
- [NFC] FileCheck tests check all overloads (NVIDIA#354)
- [REVIEW][NFC] Vendor in serialize to allow for future CUDA-specific refactoring and changes (NVIDIA#349)
- Vendor in usecases used in testing (NVIDIA#359)
- Add thirdparty tests of numba extensions (NVIDIA#348)
- Support running tests in parallel (NVIDIA#350)
- Add more debuginfo tests (NVIDIA#358)
- [REVIEW][NFC] Vendor in the Cache, CacheImpl used by CUDACache and CUDACacheImpl to allow for future CUDA-specific refactoring and changes (NVIDIA#334)
- [NFC] Vendor in Dispatcher as CUDADispatcher to allow for future CUDA-specific customization (NVIDIA#338)
- Vendor in BaseNativeLowering and BaseLower for CUDA-specific customizations (NVIDIA#329)
- [REVIEW] Vendor in the CompilerBase used by CUDACompiler to allow for future CUDA-specific refactoring and changes (NVIDIA#322)
- Vendor in Codegen and CodeLibrary for CUDA-specific customization (NVIDIA#327)
- Disable tests that deadlock due to NVIDIA#317 (NVIDIA#356)
- FIX: Add type check for shape elements in DeviceNDArrayBase constructor (NVIDIA#352)
- Merge pull request NVIDIA#265 from lakshayg/fp16-support
- Add performance warning
- Fix tests
- Create and register low++ bindings for float16
- Create typing/target registries for float16
- Replace Numbast generated lower_casts
- Replace Numbast generated operators
- Alias __half to numba.core.types.float16
- Generate fp16 bindings using numbast
- Remove existing fp16 logic
- [REVIEW][NFC] Vendor in the utils and cgutils to allow for future CUDA-specific refactoring and changes (NVIDIA#340)
- [RFC,TESTING] Add filecheck test infrastructure (NVIDIA#342)
- Migrate test infra to pytest (NVIDIA#347)
- Add .vscode to gitignore (NVIDIA#344)
- [NFC] Add dev dependencies to project config (NVIDIA#341)
- Allow Inspection of Link-Time Optimized PTX (NVIDIA#326)
- [NFC] Vendor in DIBuilder used by CUDADIBuilder (NVIDIA#332)
- Add guidance on setting up pre-commit (NVIDIA#339)
- [Refactor][NFC] Vendor in MinimalCallConv (NVIDIA#333)
- [Refactor][NFC] Vendor in BaseCallConv (NVIDIA#324)
- [REVIEW] Vendor in CompileResult as CUDACompileResult to allow for future CUDA-specific customizations (NVIDIA#325)
@gmarkall gmarkall mentioned this pull request Jul 31, 2025
gmarkall added a commit that referenced this pull request Jul 31, 2025
- [NFC] FileCheck tests check all overloads (#354)
- [REVIEW][NFC] Vendor in serialize to allow for future CUDA-specific
refactoring and changes (#349)
- Vendor in usecases used in testing (#359)
- Add thirdparty tests of numba extensions (#348)
- Support running tests in parallel (#350)
- Add more debuginfo tests (#358)
- [REVIEW][NFC] Vendor in the Cache, CacheImpl used by CUDACache and
CUDACacheImpl to allow for future CUDA-specific refactoring and changes
(#334)
- [NFC] Vendor in Dispatcher as CUDADispatcher to allow for future
CUDA-specific customization (#338)
- Vendor in BaseNativeLowering and BaseLower for CUDA-specific
customizations (#329)
- [REVIEW] Vendor in the CompilerBase used by CUDACompiler to allow for
future CUDA-specific refactoring and changes (#322)
- Vendor in Codegen and CodeLibrary for CUDA-specific customization
(#327)
- Disable tests that deadlock due to #317 (#356)
- FIX: Add type check for shape elements in DeviceNDArrayBase
constructor (#352)
- Merge pull request #265 from lakshayg/fp16-support
- Add performance warning
- Fix tests
- Create and register low++ bindings for float16
- Create typing/target registries for float16
- Replace Numbast generated lower_casts
- Replace Numbast generated operators
- Alias __half to numba.core.types.float16
- Generate fp16 bindings using numbast
- Remove existing fp16 logic
- [REVIEW][NFC] Vendor in the utils and cgutils to allow for future
CUDA-specific refactoring and changes (#340)
- [RFC,TESTING] Add filecheck test infrastructure (#342)
- Migrate test infra to pytest (#347)
- Add .vscode to gitignore (#344)
- [NFC] Add dev dependencies to project config (#341)
- Allow Inspection of Link-Time Optimized PTX (#326)
- [NFC] Vendor in DIBuilder used by CUDADIBuilder (#332)
- Add guidance on setting up pre-commit (#339)
- [Refactor][NFC] Vendor in MinimalCallConv (#333)
- [Refactor][NFC] Vendor in BaseCallConv (#324)
- [REVIEW] Vendor in CompileResult as CUDACompileResult to allow for
future CUDA-specific customizations (#325)
gmarkall added a commit that referenced this pull request Aug 7, 2025
We had initially removed the `locals` attribute from `_FunctionCompiler`
while vendoring it in for CUDA-Specific changes in
#338. This seemed to have
caused `AttributeError` for a nvmath-python test
https://github.com/NVIDIA/nvmath-python/blob/main/tests/nvmath_tests/device/test_cublasdx_numba.py.
This PR restores this `locals` attribute and initializes it to `{}`

---------

Co-authored-by: Graham Markall <gmarkall@nvidia.com>
gmarkall added a commit that referenced this pull request Aug 8, 2025
We had initially removed the `locals` attribute from `_FunctionCompiler`
while vendoring it in for CUDA-Specific changes in
#338. This seemed to have
caused `AttributeError` for a nvmath-python test
https://github.com/NVIDIA/nvmath-python/blob/main/tests/nvmath_tests/device/test_cublasdx_numba.py.
This PR restores this `locals` attribute and initializes it to `{}`

---------

Co-authored-by: Graham Markall <gmarkall@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 - Waiting on reviewer Waiting for reviewer to respond to author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants