-
Notifications
You must be signed in to change notification settings - Fork 54
feat: users can pass shared_memory_carveout to @cuda.jit
#642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: users can pass shared_memory_carveout to @cuda.jit
#642
Conversation
shared_memory_carveout to @cuda.jit
|
@gmarkall I didn't know this was still a draft 😅 would appreciate your review! Also, is there anything in particular that I can look into next? Excited to see what's next! |
Greptile SummaryThis PR adds support for configuring shared memory carveout via the
The implementation follows the existing patterns in the codebase and properly integrates with the kernel compilation and binding flow. The previous review comments have identified validation issues with boolean types being incorrectly accepted and error message inconsistencies that should be addressed. Confidence Score: 3/5
Important Files Changed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
numba_cuda/numba/cuda/decorators.py, line 33 (link)style: Missing documentation for the new
shared_memory_carveoutparameter in the docstring. Should add:Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
4 files reviewed, 1 comment
|
@kaeun97 Apologies for the delay in getting to this, it was not because it was a draft that it didn't receive a review, but just that things have been a a bit heavy (and I had a little PTO) over the last few days. |
|
/ok to test |
@gmarkall, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test bbdda61 |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks again for your efforts here!
I've had a quick skim - apologies for the delay. Some initial thoughts:
- The API design looks good to me.
- We should probably test with and without signatures to the
@cuda.jitdecorator, because the paths taken through thejitdecorator are different whenfunc_or_sigis a function and when it is a signature. - As this is a user-facing feature, we should add it to the docstring for
jit. I think the documentation is then derived from this docstring, so there shouldn't need to be a separate documentation change.
bbdda61 to
cb34ba3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
numba_cuda/numba/cuda/dispatcher.py, line 479-481 (link)logic: carveout may not be applied when kernel runs on multiple devices
the carveout is only set in
bind(), butbind()is called once during compilation. whenget_cufunc()creates a function for a different device (line 333-335 in codegen.py caches per device), the carveout won't be applied
6 files reviewed, 1 comment
0329edf to
cb34ba3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 4 comments
0b70f28 to
cb34ba3
Compare
|
Thank you for the quick response and feedback after your PTO (+ hope you got some chance to rest and have fun)! Much appreciated.
Note that as I was testing with signatures, I realized that the Also, would there be some other issue I can work on after this task? |
|
Hi @gmarkall, would be much appreciated if we can get this merged :) |
|
/ok to test 373b7c2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
| valid_strings = ["default", "maxl1", "maxshared"] | ||
| if carveout.lower() not in valid_strings: | ||
| raise ValueError( | ||
| f"Invalid carveout value: {carveout}. " | ||
| f"Must be -1 to 100 or one of {valid_strings}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message displays lowercase string options ["default", "maxl1", "maxshared"], but the documentation (lines 100-101) and tests (test_dispatcher.py line 851) use the capitalized forms "MaxL1", "MaxShared", "default". This inconsistency will confuse users who follow the documentation.
Consider updating the error message to show the capitalized forms that users should actually use:
| valid_strings = ["default", "maxl1", "maxshared"] | |
| if carveout.lower() not in valid_strings: | |
| raise ValueError( | |
| f"Invalid carveout value: {carveout}. " | |
| f"Must be -1 to 100 or one of {valid_strings}" | |
| ) | |
| valid_strings = ["default", "MaxL1", "MaxShared"] | |
| if carveout.lower() not in [s.lower() for s in valid_strings]: | |
| raise ValueError( | |
| f"Invalid carveout value: {carveout}. " | |
| f"Must be -1 to 100 or one of {valid_strings}" | |
| ) |
This way the error message displays the same capitalization as the documentation, while still accepting case-insensitive input.
| elif isinstance(carveout, int): | ||
| if not (-1 <= carveout <= 100): | ||
| raise ValueError("Carveout must be between -1 and 100") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The isinstance(carveout, int) check will also accept boolean values since bool is a subclass of int in Python. This means shared_memory_carveout=True would be treated as 1 and shared_memory_carveout=False as 0, which is likely unintended behavior.
Consider explicitly rejecting boolean types:
| elif isinstance(carveout, int): | |
| if not (-1 <= carveout <= 100): | |
| raise ValueError("Carveout must be between -1 and 100") | |
| elif isinstance(carveout, bool): | |
| raise TypeError( | |
| f"shared_memory_carveout must be str or int, got {type(carveout).__name__}" | |
| ) | |
| elif isinstance(carveout, int): | |
| if not (-1 <= carveout <= 100): | |
| raise ValueError("Carveout must be between -1 and 100") |
Note: The bool check must come before the int check since bool is a subclass of int.
| if shared_memory_carveout is not None: | ||
| _validate_shared_memory_carveout(shared_memory_carveout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shared_memory_carveout parameter is validated here but doesn't check if device=True is also set. Since shared memory carveout only applies to kernel launches (not device functions), using this parameter with device=True should raise an error rather than being silently ignored.
Consider adding validation:
| if shared_memory_carveout is not None: | |
| _validate_shared_memory_carveout(shared_memory_carveout) | |
| if shared_memory_carveout is not None: | |
| if device: | |
| raise ValueError( | |
| "shared_memory_carveout cannot be used with device=True. " | |
| "This parameter only applies to kernels, not device functions." | |
| ) | |
| _validate_shared_memory_carveout(shared_memory_carveout) |
This provides clearer user feedback when the parameter is misused.
|
/ok to test e5ba748 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
| elif isinstance(carveout, int): | ||
| if not (-1 <= carveout <= 100): | ||
| raise ValueError("Carveout must be between -1 and 100") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation accepts boolean values because bool is a subclass of int in Python. This means shared_memory_carveout=True and shared_memory_carveout=False are incorrectly accepted.
True evaluates to 1 and False evaluates to 0, both within the valid range [-1, 100], but this is likely unintended behavior that could confuse users.
To fix this, check for boolean types explicitly before checking for int:
| elif isinstance(carveout, int): | |
| if not (-1 <= carveout <= 100): | |
| raise ValueError("Carveout must be between -1 and 100") | |
| elif isinstance(carveout, bool): | |
| raise TypeError( | |
| f"shared_memory_carveout must be str or int, got {type(carveout).__name__}" | |
| ) | |
| elif isinstance(carveout, int): | |
| if not (-1 <= carveout <= 100): | |
| raise ValueError("Carveout must be between -1 and 100") |
| valid_strings = ["default", "maxl1", "maxshared"] | ||
| if carveout.lower() not in valid_strings: | ||
| raise ValueError( | ||
| f"Invalid carveout value: {carveout}. " | ||
| f"Must be -1 to 100 or one of {valid_strings}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message displays lowercase strings ["default", "maxl1", "maxshared"], but the documentation (lines 100-101) shows capitalized forms "MaxL1", "MaxShared", "default". This inconsistency will confuse users when they see an error.
The validation correctly accepts any case (due to carveout.lower() on line 320), but the error message should match what's documented.
| valid_strings = ["default", "maxl1", "maxshared"] | |
| if carveout.lower() not in valid_strings: | |
| raise ValueError( | |
| f"Invalid carveout value: {carveout}. " | |
| f"Must be -1 to 100 or one of {valid_strings}" | |
| valid_strings = ["default", "maxl1", "maxshared"] | |
| if carveout.lower() not in valid_strings: | |
| raise ValueError( | |
| f"Invalid carveout value: {carveout}. " | |
| f"Must be -1 to 100 or one of ['default', 'MaxL1', 'MaxShared']" | |
| ) |
| def test_shared_memory_carveout_invalid_values(self): | ||
| """Test that invalid carveout values raise appropriate errors""" | ||
| test_cases = [ | ||
| (150, ValueError, "must be between -1 and 100"), | ||
| (-2, ValueError, "must be between -1 and 100"), | ||
| (101, ValueError, "must be between -1 and 100"), | ||
| ("InvalidOption", ValueError, "Invalid carveout value"), | ||
| ] | ||
|
|
||
| for carveout, exc_type, msg_pattern in test_cases: | ||
| with self.subTest(carveout=carveout): | ||
| # without signature | ||
| with self.assertRaisesRegex(exc_type, msg_pattern): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test coverage for invalid values should include boolean types, which are currently incorrectly accepted due to the validation bug in decorators.py (line 325).
Consider adding test cases for:
- Boolean values:
True,False(currently accepted but shouldn't be) - Float values:
50.5,0.0(should be rejected with TypeError) - Other invalid types:
[],{},()(should be rejected with TypeError)
Example addition to test_cases:
test_cases = [
(150, ValueError, "must be between -1 and 100"),
(-2, ValueError, "must be between -1 and 100"),
(101, ValueError, "must be between -1 and 100"),
("InvalidOption", ValueError, "Invalid carveout value"),
(True, TypeError, "must be str or int"), # Currently fails - booleans are accepted
(False, TypeError, "must be str or int"), # Currently fails - booleans are accepted
(50.5, TypeError, "must be str or int"),
]|
/ok to test 1be8209 |
Greptile's behavior is changing!From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
|
@kaeun97 Thanks for this, and apologies for the delay here. Things got a bit backlogged over the last month and the holiday period, and I'm aiming to get a bit more back on track again now. |
- Add Python 3.14 to the wheel publishing matrix (NVIDIA#750) - feat: swap out internal device array usage with `StridedMemoryView` (NVIDIA#703) - Fix max block size computation in `forall` (NVIDIA#744) - Fix prologue debug line info pointing to decorator instead of def line (NVIDIA#746) - Fix kernel return type in DISubroutineType debug metadata (NVIDIA#745) - Fix missing line info in Jupyter notebooks (NVIDIA#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (NVIDIA#698) - chore(deps): add cuda-pathfinder to pixi deps (NVIDIA#741) - fix: enable flake8-bugbear lints and fix found problems (NVIDIA#708) - fix: Fix race condition in CUDA Simulator (NVIDIA#690) - ci: run tests in parallel (NVIDIA#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (NVIDIA#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (NVIDIA#739) - Pass the -numba-debug flag to libnvvm (NVIDIA#681) - ci: remove rapids containers from conda ci (NVIDIA#737) - Use `pathfinder` for dynamic libraries (NVIDIA#308) - CI: Add CUDA 13.1 testing support (NVIDIA#705) - Adding `pixi run test` and `pixi run test-par` support (NVIDIA#724) - Disable per-PR nvmath tests + follow same test practice (NVIDIA#723) - chore(deps): regenerate pixi lockfile (NVIDIA#722) - Fix DISubprogram line number to point to function definition line (NVIDIA#695) - revert: chore(dev): build pixi using rattler (NVIDIA#713) (NVIDIA#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (NVIDIA#692) - chore(dev): build pixi using rattler (NVIDIA#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (NVIDIA#704)
- Add Python 3.14 to the wheel publishing matrix (#750) - feat: swap out internal device array usage with `StridedMemoryView` (#703) - Fix max block size computation in `forall` (#744) - Fix prologue debug line info pointing to decorator instead of def line (#746) - Fix kernel return type in DISubroutineType debug metadata (#745) - Fix missing line info in Jupyter notebooks (#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (#698) - chore(deps): add cuda-pathfinder to pixi deps (#741) - fix: enable flake8-bugbear lints and fix found problems (#708) - fix: Fix race condition in CUDA Simulator (#690) - ci: run tests in parallel (#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (#739) - Pass the -numba-debug flag to libnvvm (#681) - ci: remove rapids containers from conda ci (#737) - Use `pathfinder` for dynamic libraries (#308) - CI: Add CUDA 13.1 testing support (#705) - Adding `pixi run test` and `pixi run test-par` support (#724) - Disable per-PR nvmath tests + follow same test practice (#723) - chore(deps): regenerate pixi lockfile (#722) - Fix DISubprogram line number to point to function definition line (#695) - revert: chore(dev): build pixi using rattler (#713) (#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (#692) - chore(dev): build pixi using rattler (#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (#704) <!-- Thank you for contributing to numba-cuda :) Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present) and replace it with `[REVIEW]`. If assistance is required to complete the functionality, for example when the C/C++ code of a feature is complete but Python bindings are still required, then add the label `[HELP-REQ]` so that others can triage and assist. The additional changes then can be implemented on top of the same PR. If the assistance is done by members of the rapidsAI team, then no additional actions are required by the creator of the original PR for this, otherwise the original author of the PR needs to give permission to the person(s) assisting to commit to their personal fork of the project. If that doesn't happen then a new PR based on the code of the original PR can be opened by the person assisting, which then will be the PR that will be merged. 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please do not rebase your branch on main/force push/rewrite history, doing any of these causes the context of any comments made by reviewers to be lost. If conflicts occur against main they should be resolved by merging main into the branch used for making the pull request. Many thanks in advance for your cooperation! -->
This PR requires rebase after merging this: #629.
This PR allows users to pass a new argument
shared_memory_carveoutto the @cuda.jit decorator as described here.An alternative to consider is to use
enumforcudaSharedCarveoutbut I think the current implementation is cleaner.