feat: add set_shared_memory_carveout #629

kaeun97 · 2025-12-03T01:19:49Z

This adds set_shared_memory_carveout using cuFuncSetAttribute requested here.

Do want to confirm with @gmarkall about this design (and would be nice to chat about it). An alternative is to combine this withcache_config in some way. However, I do think separating the two makes more sense mainly because it makes the intent clearer for the user (cache_config sets preference hints while set_shared_memory_carveout sets explicit percentages).

copy-pr-bot · 2025-12-03T01:19:53Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

gmarkall · 2025-12-03T14:19:18Z

/ok to test

copy-pr-bot · 2025-12-03T14:19:21Z

/ok to test

@gmarkall, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

gmarkall · 2025-12-03T14:19:47Z

/ok to test 1697fc7

gmarkall · 2025-12-03T14:22:07Z

Thanks for the PR! I'll circle back to this when I get a chance (hopefully soon) and come back to you. Happy to chat about the design as well.

gmarkall · 2025-12-04T14:15:35Z

/ok to test 93a149e

numba_cuda/numba/cuda/tests/cudadrv/test_cuda_driver.py

gmarkall

Do want to confirm with @gmarkall about this design (and would be nice to chat about it).

I think the design for the part of the problem that this PR solves looks great - I can think of no improvement.

An alternative is to combine this withcache_config in some way. However, I do think separating the two makes more sense mainly because it makes the intent clearer for the user (cache_config sets preference hints while set_shared_memory_carveout sets explicit percentages).

I agree, keeping it separate makes sense to me too.

I'm happy to chat if you have unresolved questions, but I'm also happy for this to be merged - let me know how you'd like to proceed.

kaeun97 · 2025-12-08T21:39:56Z

@gmarkall Thank you for the review! Good to hear that the design makes sense. Just wanted to make sure that I am interpreting the description from the issue properly :) Would be nice to re-run CI and merge this.

greptile-apps · 2025-12-08T21:44:06Z

Greptile Overview

Greptile Summary

This PR adds set_shared_memory_carveout functionality to configure the shared memory carveout percentage for CUDA kernels using cuFuncSetAttribute. The implementation:

Adds abstract method to Function base class and concrete implementations in both CtypesFunction and CudaPythonFunction
Validates carveout values are in the range -1 to 100 (where -1 means default)
Includes comprehensive test coverage for valid/invalid values and kernel execution

Key Issue Found:

The CtypesFunction implementation calls driver.cuFuncSetAttribute(), but this function is not defined in API_PROTOTYPES in drvapi.py. While the default code path uses CudaPythonFunction (which accesses the API via cuda.bindings), the CtypesFunction path will fail if ever used. This should be addressed by either removing the dormant ctypes code path or adding the missing API prototype.

Confidence Score: 3/5

This PR is generally safe to merge with one critical issue that needs attention
Score reflects that while the main implementation path (CudaPythonFunction) appears correct and well-tested, there's a logical error in the CtypesFunction implementation that calls an undefined API function. This would cause runtime failures if the ctypes code path is ever used. The default code path uses CudaPythonFunction, so this may not impact most users immediately, but it's a correctness issue that should be addressed.
Pay special attention to numba_cuda/numba/cuda/cudadrv/driver.py - the CtypesFunction implementation references an undefined API function

Important Files Changed

File Analysis

Filename	Score	Overview
numba_cuda/numba/cuda/cudadrv/driver.py	3/5	Added `set_shared_memory_carveout` method to Function classes with validation, but `CtypesFunction` implementation references undefined API function
numba_cuda/numba/cuda/tests/cudadrv/test_cuda_driver.py	5/5	Comprehensive test added covering valid/invalid carveout values and kernel execution after configuration

greptile-apps

Additional Comments (1)

numba_cuda/numba/cuda/cudadrv/driver.py, line 2400 (link)

logic: cuFuncSetAttribute is not defined in API_PROTOTYPES in drvapi.py. If CtypesFunction is ever used, this will fail with CudaDriverError: Driver missing function: cuFuncSetAttribute. While the default code path uses CudaPythonFunction, the CtypesFunction path should either be removed or have the API prototype added to drvapi.py:

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

gmarkall · 2025-12-09T14:56:46Z

Additional Comments (1)

1. `numba_cuda/numba/cuda/cudadrv/driver.py`, line 2400 ([link](/nvidia/numba-cuda/blob/683e774faebe7254a870818c648c556f24554cf5/numba_cuda/numba/cuda/cudadrv/driver.py#L2400))
   **logic:** `cuFuncSetAttribute` is not defined in `API_PROTOTYPES` in `drvapi.py`. If `CtypesFunction` is ever used, this will fail with `CudaDriverError: Driver missing function: cuFuncSetAttribute`. While the default code path uses `CudaPythonFunction`, the `CtypesFunction` path should either be removed or have the API prototype added to `drvapi.py`:

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

This is dead code, so it won't be hit. We need to remove it at some point.

gmarkall · 2025-12-09T14:57:14Z

/ok to test 683e774

gmarkall · 2025-12-09T16:54:58Z

Many thanks @kaeun97!

- Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)

- Capture global device arrays in kernels and device functions (#666) - Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (#643) - Fix Issue #588: separate compilation of NVVM IR modules when generating debuginfo (#591) - feat: allow printing nested tuples (#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (#655) - build(deps): bump actions/upload-artifact from 4 to 5 (#652) - Test RAPIDS 25.12 (#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (#662) - feat: add print support for int64 tuples (#663) - Only run dependabot monthly and open fewer PRs (#658) - test: fix bogus `self` argument to `Context` (#656) - Fix false negative NRT link decision when NRT was previously toggled on (#650) - Add support for dependabot (#647) - refactor: cull dead linker objects (#649) - Migrate numba-cuda driver to use cuda.core.launch API (#609) - feat: add set_shared_memory_carveout (#629) - chore: bump version in pixi.toml (#641) - refactor: remove devicearray code to reduce complexity (#600)

v0.23.0 - Capture global device arrays in kernels and device functions (NVIDIA#666) - Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)

kaeun97 changed the title ~~feat: add set_shared-memory_carveout~~ feat: add set_shared_memory_carveout Dec 3, 2025

kaeun97 added 2 commits December 3, 2025 01:39

nit

9368f5d

feat: add set_shared_memory_carveout

81d16af

kaeun97 force-pushed the kaeun97/add-set-shared-memory-carveout branch from d4d96c4 to 81d16af Compare December 3, 2025 01:45

fix: accept -1 as well and add test for invalid carveout values

1697fc7

gmarkall added the 3 - Ready for Review Ready for review by team label Dec 3, 2025

gmarkall reviewed Dec 4, 2025

View reviewed changes

numba_cuda/numba/cuda/tests/cudadrv/test_cuda_driver.py Outdated Show resolved Hide resolved

gmarkall approved these changes Dec 4, 2025

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Dec 4, 2025

fix: test carveout value -1

d91f8f4

kaeun97 force-pushed the kaeun97/add-set-shared-memory-carveout branch from 93a149e to d91f8f4 Compare December 8, 2025 21:36

Merge branch 'main' into kaeun97/add-set-shared-memory-carveout

683e774

kaeun97 marked this pull request as ready for review December 8, 2025 21:40

greptile-apps bot reviewed Dec 8, 2025

View reviewed changes

kaeun97 mentioned this pull request Dec 8, 2025

feat: users can pass shared_memory_carveout to @cuda.jit #642

Merged

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Dec 9, 2025

gmarkall approved these changes Dec 9, 2025

View reviewed changes

gmarkall merged commit 2d151db into NVIDIA:main Dec 9, 2025
141 of 142 checks passed

gmarkall mentioned this pull request Dec 17, 2025

Bump version to 0.23.0 #668

Merged

feat: add set_shared_memory_carveout #629

feat: add set_shared_memory_carveout #629

Uh oh!

Conversation

kaeun97 commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Dec 3, 2025

Uh oh!

gmarkall commented Dec 3, 2025

Uh oh!

copy-pr-bot bot commented Dec 3, 2025

Uh oh!

gmarkall commented Dec 3, 2025

Uh oh!

gmarkall commented Dec 3, 2025

Uh oh!

gmarkall commented Dec 4, 2025

Uh oh!

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

kaeun97 commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Dec 8, 2025

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

gmarkall commented Dec 9, 2025

Additional Comments (1)

Uh oh!

gmarkall commented Dec 9, 2025

Uh oh!

Uh oh!

gmarkall commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaeun97 commented Dec 3, 2025 •

edited

Loading

kaeun97 commented Dec 8, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading