Skip to content

Conversation

@kaeun97
Copy link
Contributor

@kaeun97 kaeun97 commented Dec 8, 2025

This PR requires rebase after merging this: #629.

This PR allows users to pass a new argument shared_memory_carveout to the @cuda.jit decorator as described here.

An alternative to consider is to use enum for cudaSharedCarveout but I think the current implementation is cleaner.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 8, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kaeun97 kaeun97 changed the title Kaeun97/jit shared memory carveout feat: users can pass shared_memory_carveout to @cuda.jit Dec 8, 2025
@gmarkall gmarkall added the 3 - Ready for Review Ready for review by team label Dec 9, 2025
@kaeun97 kaeun97 marked this pull request as ready for review December 16, 2025 17:36
@kaeun97
Copy link
Contributor Author

kaeun97 commented Dec 16, 2025

@gmarkall I didn't know this was still a draft 😅 would appreciate your review! Also, is there anything in particular that I can look into next? Excited to see what's next!

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 16, 2025

Greptile Summary

This PR adds support for configuring shared memory carveout via the shared_memory_carveout parameter to @cuda.jit, implementing the feature requested in issue #617. The implementation validates input, parses string values to integers, and applies the configuration during kernel binding.

  • Adds validation logic in decorators.py that accepts string values ("default", "MaxL1", "MaxShared") or integer values (-1 to 100)
  • Implements parsing and application in dispatcher.py via _parse_carveout and bind() methods
  • Includes serialization support to preserve carveout settings across kernel caching
  • Adds test coverage for both valid and invalid carveout values

The implementation follows the existing patterns in the codebase and properly integrates with the kernel compilation and binding flow. The previous review comments have identified validation issues with boolean types being incorrectly accepted and error message inconsistencies that should be addressed.

Confidence Score: 3/5

  • Safe to merge with minor validation improvements recommended
  • The implementation is functionally sound with proper integration into the kernel compilation flow. However, the validation logic has edge cases (boolean acceptance) and usability issues (error message inconsistency) that are already identified in previous review threads and should be addressed for better code quality and user experience.
  • Pay attention to numba_cuda/numba/cuda/decorators.py for validation improvements

Important Files Changed

Filename Overview
numba_cuda/numba/cuda/decorators.py Adds shared_memory_carveout parameter to @cuda.jit with validation logic. Has validation issues with boolean types and error message inconsistency.
numba_cuda/numba/cuda/dispatcher.py Implements carveout parsing and application logic. Properly handles serialization and applies carveout during kernel binding.
numba_cuda/numba/cuda/tests/cudapy/test_dispatcher.py Adds test coverage for valid and invalid carveout values. Missing tests for boolean and float types that are incorrectly accepted/not explicitly tested.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. numba_cuda/numba/cuda/decorators.py, line 33 (link)

    style: Missing documentation for the new shared_memory_carveout parameter in the docstring. Should add:

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@gmarkall
Copy link
Contributor

@kaeun97 Apologies for the delay in getting to this, it was not because it was a draft that it didn't receive a review, but just that things have been a a bit heavy (and I had a little PTO) over the last few days.

@gmarkall
Copy link
Contributor

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 16, 2025

/ok to test

@gmarkall, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@gmarkall
Copy link
Contributor

/ok to test bbdda61

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks again for your efforts here!

I've had a quick skim - apologies for the delay. Some initial thoughts:

  • The API design looks good to me.
  • We should probably test with and without signatures to the @cuda.jit decorator, because the paths taken through the jit decorator are different when func_or_sig is a function and when it is a signature.
  • As this is a user-facing feature, we should add it to the docstring for jit. I think the documentation is then derived from this docstring, so there shouldn't need to be a separate documentation change.

@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review Ready for review by team labels Dec 16, 2025
@kaeun97 kaeun97 force-pushed the kaeun97/jit-shared-memory-carveout branch from bbdda61 to cb34ba3 Compare December 17, 2025 22:39
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. numba_cuda/numba/cuda/dispatcher.py, line 479-481 (link)

    logic: carveout may not be applied when kernel runs on multiple devices

    the carveout is only set in bind(), but bind() is called once during compilation. when get_cufunc() creates a function for a different device (line 333-335 in codegen.py caches per device), the carveout won't be applied

6 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@kaeun97 kaeun97 force-pushed the kaeun97/jit-shared-memory-carveout branch from 0329edf to cb34ba3 Compare December 17, 2025 22:44
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@kaeun97 kaeun97 force-pushed the kaeun97/jit-shared-memory-carveout branch from 0b70f28 to cb34ba3 Compare December 17, 2025 22:51
@kaeun97
Copy link
Contributor Author

kaeun97 commented Dec 17, 2025

Thank you for the quick response and feedback after your PTO (+ hope you got some chance to rest and have fun)! Much appreciated.

  • add docstring for shared_memory_carveout
  • add test with signature

Note that as I was testing with signatures, I realized that the shared_memory_carveout value was checked at different times (previous implementation: (without signature) raise error at compilation time, (with signature) raise at decoration time) , so made it consistent to check it at decoration time.

Also, would there be some other issue I can work on after this task?

@kaeun97 kaeun97 requested a review from gmarkall December 19, 2025 15:59
@kaeun97
Copy link
Contributor Author

kaeun97 commented Jan 12, 2026

Hi @gmarkall, would be much appreciated if we can get this merged :)

@gmarkall
Copy link
Contributor

/ok to test 373b7c2

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +319 to +324
valid_strings = ["default", "maxl1", "maxshared"]
if carveout.lower() not in valid_strings:
raise ValueError(
f"Invalid carveout value: {carveout}. "
f"Must be -1 to 100 or one of {valid_strings}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message displays lowercase string options ["default", "maxl1", "maxshared"], but the documentation (lines 100-101) and tests (test_dispatcher.py line 851) use the capitalized forms "MaxL1", "MaxShared", "default". This inconsistency will confuse users who follow the documentation.

Consider updating the error message to show the capitalized forms that users should actually use:

Suggested change
valid_strings = ["default", "maxl1", "maxshared"]
if carveout.lower() not in valid_strings:
raise ValueError(
f"Invalid carveout value: {carveout}. "
f"Must be -1 to 100 or one of {valid_strings}"
)
valid_strings = ["default", "MaxL1", "MaxShared"]
if carveout.lower() not in [s.lower() for s in valid_strings]:
raise ValueError(
f"Invalid carveout value: {carveout}. "
f"Must be -1 to 100 or one of {valid_strings}"
)

This way the error message displays the same capitalization as the documentation, while still accepting case-insensitive input.

Comment on lines +325 to +327
elif isinstance(carveout, int):
if not (-1 <= carveout <= 100):
raise ValueError("Carveout must be between -1 and 100")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isinstance(carveout, int) check will also accept boolean values since bool is a subclass of int in Python. This means shared_memory_carveout=True would be treated as 1 and shared_memory_carveout=False as 0, which is likely unintended behavior.

Consider explicitly rejecting boolean types:

Suggested change
elif isinstance(carveout, int):
if not (-1 <= carveout <= 100):
raise ValueError("Carveout must be between -1 and 100")
elif isinstance(carveout, bool):
raise TypeError(
f"shared_memory_carveout must be str or int, got {type(carveout).__name__}"
)
elif isinstance(carveout, int):
if not (-1 <= carveout <= 100):
raise ValueError("Carveout must be between -1 and 100")

Note: The bool check must come before the int check since bool is a subclass of int.

Comment on lines +128 to +129
if shared_memory_carveout is not None:
_validate_shared_memory_carveout(shared_memory_carveout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared_memory_carveout parameter is validated here but doesn't check if device=True is also set. Since shared memory carveout only applies to kernel launches (not device functions), using this parameter with device=True should raise an error rather than being silently ignored.

Consider adding validation:

Suggested change
if shared_memory_carveout is not None:
_validate_shared_memory_carveout(shared_memory_carveout)
if shared_memory_carveout is not None:
if device:
raise ValueError(
"shared_memory_carveout cannot be used with device=True. "
"This parameter only applies to kernels, not device functions."
)
_validate_shared_memory_carveout(shared_memory_carveout)

This provides clearer user feedback when the parameter is misused.

@gmarkall
Copy link
Contributor

/ok to test e5ba748

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +325 to +327
elif isinstance(carveout, int):
if not (-1 <= carveout <= 100):
raise ValueError("Carveout must be between -1 and 100")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation accepts boolean values because bool is a subclass of int in Python. This means shared_memory_carveout=True and shared_memory_carveout=False are incorrectly accepted.

True evaluates to 1 and False evaluates to 0, both within the valid range [-1, 100], but this is likely unintended behavior that could confuse users.

To fix this, check for boolean types explicitly before checking for int:

Suggested change
elif isinstance(carveout, int):
if not (-1 <= carveout <= 100):
raise ValueError("Carveout must be between -1 and 100")
elif isinstance(carveout, bool):
raise TypeError(
f"shared_memory_carveout must be str or int, got {type(carveout).__name__}"
)
elif isinstance(carveout, int):
if not (-1 <= carveout <= 100):
raise ValueError("Carveout must be between -1 and 100")

Comment on lines +319 to +323
valid_strings = ["default", "maxl1", "maxshared"]
if carveout.lower() not in valid_strings:
raise ValueError(
f"Invalid carveout value: {carveout}. "
f"Must be -1 to 100 or one of {valid_strings}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message displays lowercase strings ["default", "maxl1", "maxshared"], but the documentation (lines 100-101) shows capitalized forms "MaxL1", "MaxShared", "default". This inconsistency will confuse users when they see an error.

The validation correctly accepts any case (due to carveout.lower() on line 320), but the error message should match what's documented.

Suggested change
valid_strings = ["default", "maxl1", "maxshared"]
if carveout.lower() not in valid_strings:
raise ValueError(
f"Invalid carveout value: {carveout}. "
f"Must be -1 to 100 or one of {valid_strings}"
valid_strings = ["default", "maxl1", "maxshared"]
if carveout.lower() not in valid_strings:
raise ValueError(
f"Invalid carveout value: {carveout}. "
f"Must be -1 to 100 or one of ['default', 'MaxL1', 'MaxShared']"
)

Comment on lines +821 to +833
def test_shared_memory_carveout_invalid_values(self):
"""Test that invalid carveout values raise appropriate errors"""
test_cases = [
(150, ValueError, "must be between -1 and 100"),
(-2, ValueError, "must be between -1 and 100"),
(101, ValueError, "must be between -1 and 100"),
("InvalidOption", ValueError, "Invalid carveout value"),
]

for carveout, exc_type, msg_pattern in test_cases:
with self.subTest(carveout=carveout):
# without signature
with self.assertRaisesRegex(exc_type, msg_pattern):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage for invalid values should include boolean types, which are currently incorrectly accepted due to the validation bug in decorators.py (line 325).

Consider adding test cases for:

  • Boolean values: True, False (currently accepted but shouldn't be)
  • Float values: 50.5, 0.0 (should be rejected with TypeError)
  • Other invalid types: [], {}, () (should be rejected with TypeError)

Example addition to test_cases:

test_cases = [
    (150, ValueError, "must be between -1 and 100"),
    (-2, ValueError, "must be between -1 and 100"),
    (101, ValueError, "must be between -1 and 100"),
    ("InvalidOption", ValueError, "Invalid carveout value"),
    (True, TypeError, "must be str or int"),  # Currently fails - booleans are accepted
    (False, TypeError, "must be str or int"),  # Currently fails - booleans are accepted
    (50.5, TypeError, "must be str or int"),
]

@gmarkall
Copy link
Contributor

/ok to test 1be8209

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 22, 2026

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@gmarkall gmarkall enabled auto-merge (squash) January 22, 2026 13:50
@gmarkall gmarkall added 4 - Waiting on CI Waiting for a CI run to finish successfully and removed 4 - Waiting on author Waiting for author to respond to review 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Jan 22, 2026
@gmarkall gmarkall merged commit e1a2496 into NVIDIA:main Jan 22, 2026
105 checks passed
@gmarkall
Copy link
Contributor

@kaeun97 Thanks for this, and apologies for the delay here. Things got a bit backlogged over the last month and the holiday period, and I'm aiming to get a bit more back on track again now.

gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Jan 27, 2026
- Add Python 3.14 to the wheel publishing matrix (NVIDIA#750)
- feat: swap out internal device array usage with `StridedMemoryView` (NVIDIA#703)
- Fix max block size computation in `forall` (NVIDIA#744)
- Fix prologue debug line info pointing to decorator instead of def line (NVIDIA#746)
- Fix kernel return type in DISubroutineType debug metadata (NVIDIA#745)
- Fix missing line info in Jupyter notebooks (NVIDIA#742)
- Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (NVIDIA#698)
- chore(deps): add cuda-pathfinder to pixi deps (NVIDIA#741)
- fix: enable flake8-bugbear lints and fix found problems (NVIDIA#708)
- fix: Fix race condition in CUDA Simulator (NVIDIA#690)
- ci: run tests in parallel (NVIDIA#740)
- feat: users can pass `shared_memory_carveout` to @cuda.jit (NVIDIA#642)
- Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (NVIDIA#739)
- Pass the -numba-debug flag to libnvvm (NVIDIA#681)
- ci: remove rapids containers from conda ci (NVIDIA#737)
- Use `pathfinder` for dynamic libraries (NVIDIA#308)
- CI: Add CUDA 13.1 testing support (NVIDIA#705)
- Adding `pixi run test` and `pixi run test-par` support (NVIDIA#724)
- Disable per-PR nvmath tests + follow same test practice (NVIDIA#723)
- chore(deps): regenerate pixi lockfile (NVIDIA#722)
- Fix DISubprogram line number to point to function definition line (NVIDIA#695)
- revert: chore(dev): build pixi using rattler (NVIDIA#713) (NVIDIA#719)
- [feat] Initial version of the Numba CUDA GDB pretty-printer (NVIDIA#692)
- chore(dev): build pixi using rattler (NVIDIA#713)
- build(deps): bump the actions-monthly group across 1 directory with 8 updates (NVIDIA#704)
@gmarkall gmarkall mentioned this pull request Jan 27, 2026
kkraus14 pushed a commit that referenced this pull request Jan 28, 2026
- Add Python 3.14 to the wheel publishing matrix (#750)
- feat: swap out internal device array usage with `StridedMemoryView`
(#703)
- Fix max block size computation in `forall` (#744)
- Fix prologue debug line info pointing to decorator instead of def line
(#746)
- Fix kernel return type in DISubroutineType debug metadata (#745)
- Fix missing line info in Jupyter notebooks (#742)
- Fix: Pass correct flags to linker when debugging in the presence of
LTOIR code (#698)
- chore(deps): add cuda-pathfinder to pixi deps (#741)
- fix: enable flake8-bugbear lints and fix found problems (#708)
- fix: Fix race condition in CUDA Simulator (#690)
- ci: run tests in parallel (#740)
- feat: users can pass `shared_memory_carveout` to @cuda.jit (#642)
- Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (#739)
- Pass the -numba-debug flag to libnvvm (#681)
- ci: remove rapids containers from conda ci (#737)
- Use `pathfinder` for dynamic libraries (#308)
- CI: Add CUDA 13.1 testing support (#705)
- Adding `pixi run test` and `pixi run test-par` support (#724)
- Disable per-PR nvmath tests + follow same test practice (#723)
- chore(deps): regenerate pixi lockfile (#722)
- Fix DISubprogram line number to point to function definition line
(#695)
- revert: chore(dev): build pixi using rattler (#713) (#719)
- [feat] Initial version of the Numba CUDA GDB pretty-printer (#692)
- chore(dev): build pixi using rattler (#713)
- build(deps): bump the actions-monthly group across 1 directory with 8
updates (#704)

<!--

Thank you for contributing to numba-cuda :)

Here are some guidelines to help the review process go smoothly.

1. Please write a description in this text box of the changes that are
being
   made.

2. Please ensure that you have written units tests for the changes
made/features
   added.

3. If you are closing an issue please use one of the automatic closing
words as
noted here:
https://help.github.com/articles/closing-issues-using-keywords/

4. If your pull request is not ready for review but you want to make use
of the
continuous integration testing facilities please label it with `[WIP]`.

5. If your pull request is ready to be reviewed without requiring
additional
work on top of it, then remove the `[WIP]` label (if present) and
replace
it with `[REVIEW]`. If assistance is required to complete the
functionality,
for example when the C/C++ code of a feature is complete but Python
bindings
are still required, then add the label `[HELP-REQ]` so that others can
triage
and assist. The additional changes then can be implemented on top of the
same PR. If the assistance is done by members of the rapidsAI team, then
no
additional actions are required by the creator of the original PR for
this,
otherwise the original author of the PR needs to give permission to the
person(s) assisting to commit to their personal fork of the project. If
that
doesn't happen then a new PR based on the code of the original PR can be
opened by the person assisting, which then will be the PR that will be
   merged.

6. Once all work has been done and review has taken place please do not
add
features or make changes out of the scope of those requested by the
reviewer
(doing this just add delays as already reviewed code ends up having to
be
re-reviewed/it is hard to tell what is new etc!). Further, please do not
rebase your branch on main/force push/rewrite history, doing any of
these
   causes the context of any comments made by reviewers to be lost. If
   conflicts occur against main they should be resolved by merging main
   into the branch used for making the pull request.

Many thanks in advance for your cooperation!

-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 - Waiting on CI Waiting for a CI run to finish successfully

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants