feat: users can pass `shared_memory_carveout` to @cuda.jit #642

kaeun97 · 2025-12-08T23:37:37Z

This PR requires rebase after merging this: #629.

This PR allows users to pass a new argument shared_memory_carveout to the @cuda.jit decorator as described here.

An alternative to consider is to use enum for cudaSharedCarveout but I think the current implementation is cleaner.

copy-pr-bot · 2025-12-08T23:37:41Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

kaeun97 · 2025-12-16T17:37:55Z

@gmarkall I didn't know this was still a draft 😅 would appreciate your review! Also, is there anything in particular that I can look into next? Excited to see what's next!

greptile-apps · 2025-12-16T17:40:35Z

Greptile Summary

This PR adds support for configuring shared memory carveout via the shared_memory_carveout parameter to @cuda.jit, implementing the feature requested in issue #617. The implementation validates input, parses string values to integers, and applies the configuration during kernel binding.

Adds validation logic in decorators.py that accepts string values ("default", "MaxL1", "MaxShared") or integer values (-1 to 100)
Implements parsing and application in dispatcher.py via _parse_carveout and bind() methods
Includes serialization support to preserve carveout settings across kernel caching
Adds test coverage for both valid and invalid carveout values

The implementation follows the existing patterns in the codebase and properly integrates with the kernel compilation and binding flow. The previous review comments have identified validation issues with boolean types being incorrectly accepted and error message inconsistencies that should be addressed.

Confidence Score: 3/5

Safe to merge with minor validation improvements recommended
The implementation is functionally sound with proper integration into the kernel compilation flow. However, the validation logic has edge cases (boolean acceptance) and usability issues (error message inconsistency) that are already identified in previous review threads and should be addressed for better code quality and user experience.
Pay attention to numba_cuda/numba/cuda/decorators.py for validation improvements

Important Files Changed

Filename	Overview
numba_cuda/numba/cuda/decorators.py	Adds `shared_memory_carveout` parameter to `@cuda.jit` with validation logic. Has validation issues with boolean types and error message inconsistency.
numba_cuda/numba/cuda/dispatcher.py	Implements carveout parsing and application logic. Properly handles serialization and applies carveout during kernel binding.
numba_cuda/numba/cuda/tests/cudapy/test_dispatcher.py	Adds test coverage for valid and invalid carveout values. Missing tests for boolean and float types that are incorrectly accepted/not explicitly tested.

greptile-apps

Additional Comments (1)

numba_cuda/numba/cuda/decorators.py, line 33 (link)

style: Missing documentation for the new shared_memory_carveout parameter in the docstring. Should add:

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

gmarkall · 2025-12-16T19:34:06Z

@kaeun97 Apologies for the delay in getting to this, it was not because it was a draft that it didn't receive a review, but just that things have been a a bit heavy (and I had a little PTO) over the last few days.

gmarkall · 2025-12-16T19:40:16Z

/ok to test

copy-pr-bot · 2025-12-16T19:40:20Z

/ok to test

@gmarkall, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

gmarkall · 2025-12-16T19:41:03Z

/ok to test bbdda61

gmarkall

Many thanks again for your efforts here!

I've had a quick skim - apologies for the delay. Some initial thoughts:

The API design looks good to me.
We should probably test with and without signatures to the @cuda.jit decorator, because the paths taken through the jit decorator are different when func_or_sig is a function and when it is a signature.
As this is a user-facing feature, we should add it to the docstring for jit. I think the documentation is then derived from this docstring, so there shouldn't need to be a separate documentation change.

greptile-apps

Additional Comments (1)

numba_cuda/numba/cuda/dispatcher.py, line 479-481 (link)

logic: carveout may not be applied when kernel runs on multiple devices

the carveout is only set in bind(), but bind() is called once during compilation. when get_cufunc() creates a function for a different device (line 333-335 in codegen.py caches per device), the carveout won't be applied

_{6 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{5 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

kaeun97 · 2025-12-17T23:01:48Z

Thank you for the quick response and feedback after your PTO (+ hope you got some chance to rest and have fun)! Much appreciated.

add docstring for shared_memory_carveout
add test with signature

Note that as I was testing with signatures, I realized that the shared_memory_carveout value was checked at different times (previous implementation: (without signature) raise error at compilation time, (with signature) raise at decoration time) , so made it consistent to check it at decoration time.

Also, would there be some other issue I can work on after this task?

kaeun97 · 2026-01-12T11:38:16Z

Hi @gmarkall, would be much appreciated if we can get this merged :)

gmarkall · 2026-01-12T12:31:43Z

/ok to test 373b7c2

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T12:36:16Z

numba_cuda/numba/cuda/decorators.py

+        valid_strings = ["default", "maxl1", "maxshared"]
+        if carveout.lower() not in valid_strings:
+            raise ValueError(
+                f"Invalid carveout value: {carveout}. "
+                f"Must be -1 to 100 or one of {valid_strings}"
+            )


The error message displays lowercase string options ["default", "maxl1", "maxshared"], but the documentation (lines 100-101) and tests (test_dispatcher.py line 851) use the capitalized forms "MaxL1", "MaxShared", "default". This inconsistency will confuse users who follow the documentation.

Consider updating the error message to show the capitalized forms that users should actually use:

Suggested change

valid_strings = ["default", "maxl1", "maxshared"]

if carveout.lower() not in valid_strings:

raise ValueError(

f"Invalid carveout value: {carveout}. "

f"Must be -1 to 100 or one of {valid_strings}"

)

valid_strings = ["default", "MaxL1", "MaxShared"]

if carveout.lower() not in [s.lower() for s in valid_strings]:

raise ValueError(

f"Invalid carveout value: {carveout}. "

f"Must be -1 to 100 or one of {valid_strings}"

)

This way the error message displays the same capitalization as the documentation, while still accepting case-insensitive input.

greptile-apps · 2026-01-12T12:36:17Z

numba_cuda/numba/cuda/decorators.py

+    elif isinstance(carveout, int):
+        if not (-1 <= carveout <= 100):
+            raise ValueError("Carveout must be between -1 and 100")


The isinstance(carveout, int) check will also accept boolean values since bool is a subclass of int in Python. This means shared_memory_carveout=True would be treated as 1 and shared_memory_carveout=False as 0, which is likely unintended behavior.

Consider explicitly rejecting boolean types:

Suggested change

elif isinstance(carveout, int):

if not (-1 <= carveout <= 100):

raise ValueError("Carveout must be between -1 and 100")

elif isinstance(carveout, bool):

raise TypeError(

f"shared_memory_carveout must be str or int, got {type(carveout).__name__}"

)

elif isinstance(carveout, int):

if not (-1 <= carveout <= 100):

raise ValueError("Carveout must be between -1 and 100")

Note: The bool check must come before the int check since bool is a subclass of int.

greptile-apps · 2026-01-12T12:36:18Z

numba_cuda/numba/cuda/decorators.py

+    if shared_memory_carveout is not None:
+        _validate_shared_memory_carveout(shared_memory_carveout)


The shared_memory_carveout parameter is validated here but doesn't check if device=True is also set. Since shared memory carveout only applies to kernel launches (not device functions), using this parameter with device=True should raise an error rather than being silently ignored.

Consider adding validation:

Suggested change

if shared_memory_carveout is not None:

_validate_shared_memory_carveout(shared_memory_carveout)

if shared_memory_carveout is not None:

if device:

raise ValueError(

"shared_memory_carveout cannot be used with device=True. "

"This parameter only applies to kernels, not device functions."

)

_validate_shared_memory_carveout(shared_memory_carveout)

This provides clearer user feedback when the parameter is misused.

gmarkall · 2026-01-12T20:53:27Z

/ok to test e5ba748

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T20:58:53Z

numba_cuda/numba/cuda/decorators.py

+    elif isinstance(carveout, int):
+        if not (-1 <= carveout <= 100):
+            raise ValueError("Carveout must be between -1 and 100")


The validation accepts boolean values because bool is a subclass of int in Python. This means shared_memory_carveout=True and shared_memory_carveout=False are incorrectly accepted.

True evaluates to 1 and False evaluates to 0, both within the valid range [-1, 100], but this is likely unintended behavior that could confuse users.

To fix this, check for boolean types explicitly before checking for int:

Suggested change

elif isinstance(carveout, int):

if not (-1 <= carveout <= 100):

raise ValueError("Carveout must be between -1 and 100")

elif isinstance(carveout, bool):

raise TypeError(

f"shared_memory_carveout must be str or int, got {type(carveout).__name__}"

)

elif isinstance(carveout, int):

if not (-1 <= carveout <= 100):

raise ValueError("Carveout must be between -1 and 100")

greptile-apps · 2026-01-12T20:58:54Z

numba_cuda/numba/cuda/decorators.py

+        valid_strings = ["default", "maxl1", "maxshared"]
+        if carveout.lower() not in valid_strings:
+            raise ValueError(
+                f"Invalid carveout value: {carveout}. "
+                f"Must be -1 to 100 or one of {valid_strings}"


The error message displays lowercase strings ["default", "maxl1", "maxshared"], but the documentation (lines 100-101) shows capitalized forms "MaxL1", "MaxShared", "default". This inconsistency will confuse users when they see an error.

The validation correctly accepts any case (due to carveout.lower() on line 320), but the error message should match what's documented.

Suggested change

valid_strings = ["default", "maxl1", "maxshared"]

if carveout.lower() not in valid_strings:

raise ValueError(

f"Invalid carveout value: {carveout}. "

f"Must be -1 to 100 or one of {valid_strings}"

valid_strings = ["default", "maxl1", "maxshared"]

if carveout.lower() not in valid_strings:

raise ValueError(

f"Invalid carveout value: {carveout}. "

f"Must be -1 to 100 or one of ['default', 'MaxL1', 'MaxShared']"

)

greptile-apps · 2026-01-12T20:58:55Z

numba_cuda/numba/cuda/tests/cudapy/test_dispatcher.py

+    def test_shared_memory_carveout_invalid_values(self):
+        """Test that invalid carveout values raise appropriate errors"""
+        test_cases = [
+            (150, ValueError, "must be between -1 and 100"),
+            (-2, ValueError, "must be between -1 and 100"),
+            (101, ValueError, "must be between -1 and 100"),
+            ("InvalidOption", ValueError, "Invalid carveout value"),
+        ]
+
+        for carveout, exc_type, msg_pattern in test_cases:
+            with self.subTest(carveout=carveout):
+                # without signature
+                with self.assertRaisesRegex(exc_type, msg_pattern):


The test coverage for invalid values should include boolean types, which are currently incorrectly accepted due to the validation bug in decorators.py (line 325).

Consider adding test cases for:

Boolean values: True, False (currently accepted but shouldn't be)

Float values: 50.5, 0.0 (should be rejected with TypeError)

Other invalid types: [], {}, () (should be rejected with TypeError)

Example addition to test_cases:

test_cases = [ (150, ValueError, "must be between -1 and 100"), (-2, ValueError, "must be between -1 and 100"), (101, ValueError, "must be between -1 and 100"), ("InvalidOption", ValueError, "Invalid carveout value"), (True, TypeError, "must be str or int"), # Currently fails - booleans are accepted (False, TypeError, "must be str or int"), # Currently fails - booleans are accepted (50.5, TypeError, "must be str or int"), ]

…mory-carveout

gmarkall · 2026-01-22T12:24:19Z

/ok to test 1be8209

greptile-apps · 2026-01-22T12:27:06Z

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

gmarkall · 2026-01-22T14:28:33Z

@kaeun97 Thanks for this, and apologies for the delay here. Things got a bit backlogged over the last month and the holiday period, and I'm aiming to get a bit more back on track again now.

- Add Python 3.14 to the wheel publishing matrix (NVIDIA#750) - feat: swap out internal device array usage with `StridedMemoryView` (NVIDIA#703) - Fix max block size computation in `forall` (NVIDIA#744) - Fix prologue debug line info pointing to decorator instead of def line (NVIDIA#746) - Fix kernel return type in DISubroutineType debug metadata (NVIDIA#745) - Fix missing line info in Jupyter notebooks (NVIDIA#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (NVIDIA#698) - chore(deps): add cuda-pathfinder to pixi deps (NVIDIA#741) - fix: enable flake8-bugbear lints and fix found problems (NVIDIA#708) - fix: Fix race condition in CUDA Simulator (NVIDIA#690) - ci: run tests in parallel (NVIDIA#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (NVIDIA#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (NVIDIA#739) - Pass the -numba-debug flag to libnvvm (NVIDIA#681) - ci: remove rapids containers from conda ci (NVIDIA#737) - Use `pathfinder` for dynamic libraries (NVIDIA#308) - CI: Add CUDA 13.1 testing support (NVIDIA#705) - Adding `pixi run test` and `pixi run test-par` support (NVIDIA#724) - Disable per-PR nvmath tests + follow same test practice (NVIDIA#723) - chore(deps): regenerate pixi lockfile (NVIDIA#722) - Fix DISubprogram line number to point to function definition line (NVIDIA#695) - revert: chore(dev): build pixi using rattler (NVIDIA#713) (NVIDIA#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (NVIDIA#692) - chore(dev): build pixi using rattler (NVIDIA#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (NVIDIA#704)

- Add Python 3.14 to the wheel publishing matrix (#750) - feat: swap out internal device array usage with `StridedMemoryView` (#703) - Fix max block size computation in `forall` (#744) - Fix prologue debug line info pointing to decorator instead of def line (#746) - Fix kernel return type in DISubroutineType debug metadata (#745) - Fix missing line info in Jupyter notebooks (#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (#698) - chore(deps): add cuda-pathfinder to pixi deps (#741) - fix: enable flake8-bugbear lints and fix found problems (#708) - fix: Fix race condition in CUDA Simulator (#690) - ci: run tests in parallel (#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (#739) - Pass the -numba-debug flag to libnvvm (#681) - ci: remove rapids containers from conda ci (#737) - Use `pathfinder` for dynamic libraries (#308) - CI: Add CUDA 13.1 testing support (#705) - Adding `pixi run test` and `pixi run test-par` support (#724) - Disable per-PR nvmath tests + follow same test practice (#723) - chore(deps): regenerate pixi lockfile (#722) - Fix DISubprogram line number to point to function definition line (#695) - revert: chore(dev): build pixi using rattler (#713) (#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (#692) - chore(dev): build pixi using rattler (#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (#704)

kaeun97 and others added 6 commits December 3, 2025 01:39

nit

9368f5d

feat: add set_shared_memory_carveout

81d16af

fix: accept -1 as well and add test for invalid carveout values

1697fc7

fix: test carveout value -1

d91f8f4

Merge branch 'main' into kaeun97/add-set-shared-memory-carveout

683e774

feat: users can pass shared_memory_carveout in jit

17f23dc

kaeun97 changed the title ~~Kaeun97/jit shared memory carveout~~ feat: users can pass shared_memory_carveout to @cuda.jit Dec 8, 2025

gmarkall added the 3 - Ready for Review Ready for review by team label Dec 9, 2025

kaeun97 marked this pull request as ready for review December 16, 2025 17:36

greptile-apps bot reviewed Dec 16, 2025

View reviewed changes

gmarkall reviewed Dec 16, 2025

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review Ready for review by team labels Dec 16, 2025

fix: validate parameters at decoration time and add test for signature

cb34ba3

kaeun97 force-pushed the kaeun97/jit-shared-memory-carveout branch from bbdda61 to cb34ba3 Compare December 17, 2025 22:39

greptile-apps bot reviewed Dec 17, 2025

View reviewed changes

kaeun97 force-pushed the kaeun97/jit-shared-memory-carveout branch from 0329edf to cb34ba3 Compare December 17, 2025 22:44

greptile-apps bot reviewed Dec 17, 2025

View reviewed changes

kaeun97 force-pushed the kaeun97/jit-shared-memory-carveout branch from 0b70f28 to cb34ba3 Compare December 17, 2025 22:51

Kaeun Kim and others added 2 commits December 17, 2025 22:55

Merge branch 'main' into kaeun97/jit-shared-memory-carveout

0bb051f

nit

35ce525

kaeun97 requested a review from gmarkall December 19, 2025 15:59

Merge branch 'main' into kaeun97/jit-shared-memory-carveout

373b7c2

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

Merge branch 'main' into kaeun97/jit-shared-memory-carveout

e5ba748

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

Merge remote-tracking branch 'NVIDIA/main' into kaeun97/jit-shared-me…

1be8209

…mory-carveout

gmarkall approved these changes Jan 22, 2026

View reviewed changes

gmarkall enabled auto-merge (squash) January 22, 2026 13:50

gmarkall added 4 - Waiting on CI Waiting for a CI run to finish successfully and removed 4 - Waiting on author Waiting for author to respond to review 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Jan 22, 2026

gmarkall merged commit e1a2496 into NVIDIA:main Jan 22, 2026
105 checks passed

gmarkall mentioned this pull request Jan 27, 2026

Bump version to 0.25.0 #752

Merged

		if shared_memory_carveout is not None:
		_validate_shared_memory_carveout(shared_memory_carveout)

feat: users can pass shared_memory_carveout to @cuda.jit #642

feat: users can pass shared_memory_carveout to @cuda.jit #642

Uh oh!

Conversation

kaeun97 commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Dec 8, 2025

Uh oh!

kaeun97 commented Dec 16, 2025

Uh oh!

greptile-apps bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

gmarkall commented Dec 16, 2025

Uh oh!

gmarkall commented Dec 16, 2025

Uh oh!

copy-pr-bot bot commented Dec 16, 2025

Uh oh!

gmarkall commented Dec 16, 2025

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaeun97 commented Dec 17, 2025

Uh oh!

kaeun97 commented Jan 12, 2026

Uh oh!

gmarkall commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Jan 22, 2026

Uh oh!

greptile-apps bot commented Jan 22, 2026

Greptile's behavior is changing!

Uh oh!

Uh oh!

feat: users can pass `shared_memory_carveout` to @cuda.jit #642

feat: users can pass `shared_memory_carveout` to @cuda.jit #642

kaeun97 commented Dec 8, 2025 •

edited

Loading

greptile-apps bot commented Dec 16, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading