-
Notifications
You must be signed in to change notification settings - Fork 54
feat: add set_shared_memory_carveout #629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add set_shared_memory_carveout #629
Conversation
d4d96c4 to
81d16af
Compare
|
/ok to test |
@gmarkall, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test 1697fc7 |
|
Thanks for the PR! I'll circle back to this when I get a chance (hopefully soon) and come back to you. Happy to chat about the design as well. |
|
/ok to test 93a149e |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do want to confirm with @gmarkall about this design (and would be nice to chat about it).
I think the design for the part of the problem that this PR solves looks great - I can think of no improvement.
An alternative is to combine this with
cache_configin some way. However, I do think separating the two makes more sense mainly because it makes the intent clearer for the user (cache_config sets preference hints while set_shared_memory_carveout sets explicit percentages).
I agree, keeping it separate makes sense to me too.
I'm happy to chat if you have unresolved questions, but I'm also happy for this to be merged - let me know how you'd like to proceed.
93a149e to
d91f8f4
Compare
|
@gmarkall Thank you for the review! Good to hear that the design makes sense. Just wanted to make sure that I am interpreting the description from the issue properly :) Would be nice to re-run CI and merge this. |
Greptile OverviewGreptile SummaryThis PR adds
Key Issue Found:
Confidence Score: 3/5
Important Files ChangedFile Analysis
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
numba_cuda/numba/cuda/cudadrv/driver.py, line 2400 (link)logic:
cuFuncSetAttributeis not defined inAPI_PROTOTYPESindrvapi.py. IfCtypesFunctionis ever used, this will fail withCudaDriverError: Driver missing function: cuFuncSetAttribute. While the default code path usesCudaPythonFunction, theCtypesFunctionpath should either be removed or have the API prototype added todrvapi.py:
2 files reviewed, 1 comment
This is dead code, so it won't be hit. We need to remove it at some point. |
|
/ok to test 683e774 |
|
Many thanks @kaeun97! |
- Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)
- Capture global device arrays in kernels and device functions (#666) - Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (#643) - Fix Issue #588: separate compilation of NVVM IR modules when generating debuginfo (#591) - feat: allow printing nested tuples (#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (#655) - build(deps): bump actions/upload-artifact from 4 to 5 (#652) - Test RAPIDS 25.12 (#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (#662) - feat: add print support for int64 tuples (#663) - Only run dependabot monthly and open fewer PRs (#658) - test: fix bogus `self` argument to `Context` (#656) - Fix false negative NRT link decision when NRT was previously toggled on (#650) - Add support for dependabot (#647) - refactor: cull dead linker objects (#649) - Migrate numba-cuda driver to use cuda.core.launch API (#609) - feat: add set_shared_memory_carveout (#629) - chore: bump version in pixi.toml (#641) - refactor: remove devicearray code to reduce complexity (#600)
v0.23.0 - Capture global device arrays in kernels and device functions (NVIDIA#666) - Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)
This adds
set_shared_memory_carveoutusingcuFuncSetAttributerequested here.Do want to confirm with @gmarkall about this design (and would be nice to chat about it). An alternative is to combine this with
cache_configin some way. However, I do think separating the two makes more sense mainly because it makes the intent clearer for the user (cache_config sets preference hints while set_shared_memory_carveout sets explicit percentages).