-
Notifications
You must be signed in to change notification settings - Fork 54
Add DWARF variant part support for polymorphic variables in CUDA debug info #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 387f756 |
|
/ok to test 98e1283 |
WalkthroughThis PR introduces polymorphic debug information support for CUDA debugging. It adds version detection for llvmlite support, enhanced debug metadata generation for UnionModel payloads using DWARF5 variant_part structures, and corresponding storage/retrieval logic for polymorphic variables with discriminant handling in the lowering layer. Changes
Sequence DiagramsequenceDiagram
participant Config as Configuration
participant DebugInfo as DebugInfo Builder
participant Lowering as CUDA Lowering
participant Runtime as Runtime Execution
Config->>DebugInfo: Initialize CUDA_DEBUG_POLY flag
Note over DebugInfo: _check_polymorphic_debug_info_support()<br/>detects llvmlite version & capabilities
alt CUDA_DEBUG_POLY enabled
Lowering->>Lowering: _alloca_var: allocate 2-slot aggregate<br/>(discriminant + data)
Lowering->>Runtime: storevar: write discriminant at offset 0
Lowering->>Runtime: storevar: write data at offset sizeof(fetype)
Runtime->>Lowering: getvar: read from offset sizeof(fetype)
DebugInfo->>DebugInfo: _var_type: generate variant_part<br/>wrapper struct metadata
else CUDA_DEBUG_POLY disabled
Lowering->>Lowering: _alloca_var: allocate single slot
Lowering->>Runtime: storevar: standard path
Runtime->>Lowering: getvar: simple bitcast
DebugInfo->>DebugInfo: _var_type: emit legacy union<br/>(DICompositeType)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The changes introduce new detection logic, multiple conditional paths in core debug metadata generation and variable storage/retrieval, and span three interconnected files with varying logic density. The polymorphic variant handling in Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/ok to test a7f5393 |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
numba_cuda/numba/cuda/tests/cudapy/test_debuginfo.py (1)
467-506: Tighten FileCheck and clean up unused noqa markers
- Patterns look correct and adapt to typed-const vs node-ref. LGTM.
- Static analysis flags unused “# noqa: F841” on Lines 483–485. Remove them; these assignments are already intentionally unused in a test context.
Apply this minimal cleanup:
- foo = 3.14 # noqa: F841 - foo = True # noqa: F841 - foo = np.int32(42) # noqa: F841 + foo = 3.14 + foo = True + foo = np.int32(42)numba_cuda/numba/cuda/lowering.py (2)
1863-1867: Alloc size choice matches test expectations; consider documenting alignment rationaleDoubling max element size to house [discriminant + data] matches the 128‑bit wrapper check. A brief comment on why data lives at offset sizeof(fetype) (not a fixed base) would help future maintainers reason about padding/alignment and the DWARF offsets.
1920-1939: Offset read mirrors store; ensure index type width is intentionalUsing i32 indices in GEP on i8* works, but LLVM permits any integer width. If targeting 64‑bit consistently, consider IntType(64) to avoid future assertions in newer llvmlite.
- [llvm_ir.Constant(llvm_ir.IntType(32), sizeof_bytes)], + [llvm_ir.Constant(llvm_ir.IntType(64), sizeof_bytes)],numba_cuda/numba/cuda/debuginfo.py (2)
61-70: Config mutation at import timeSetting config defaults here is convenient but surprising. Consider centralizing config defaults in a single module (or guard behind explicit env/config read) to avoid hidden side‑effects on import.
679-794: DWARF variant_part emission looks consistent; add two clarifying comments
- Offsets: you set each variant member’s offset to its own width (bits). This aligns with the storage scheme (data at sizeof(fetype)). Add a short comment to tie the metadata to the layout decision.
- Discriminator: mark as artificial and 8 bits — good. Consider clarifying that wrapper size = 2*maxwidth to reserve [discriminant+padding]+[max variant] as implemented in lowering.
Add comments near metadata_dict["offset"] and wrapper_struct_size explaining the memory/layout rationale.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
numba_cuda/numba/cuda/debuginfo.py(3 hunks)numba_cuda/numba/cuda/lowering.py(3 hunks)numba_cuda/numba/cuda/tests/cudapy/test_debuginfo.py(3 hunks)
🧰 Additional context used
🪛 Ruff (0.14.1)
numba_cuda/numba/cuda/debuginfo.py
27-27: Consider moving this statement to an else block
(TRY300)
28-28: Do not catch blind exception: Exception
(BLE001)
54-54: Consider moving this statement to an else block
(TRY300)
56-56: Do not catch blind exception: Exception
(BLE001)
numba_cuda/numba/cuda/tests/cudapy/test_debuginfo.py
483-483: Unused noqa directive (unused: F841)
Remove unused noqa directive
(RUF100)
484-484: Unused noqa directive (unused: F841)
Remove unused noqa directive
(RUF100)
485-485: Unused noqa directive (unused: F841)
Remove unused noqa directive
(RUF100)
🔇 Additional comments (3)
numba_cuda/numba/cuda/tests/cudapy/test_debuginfo.py (2)
10-10: Import of config is appropriateNeeded for gating on CUDA_DEBUG_POLY. No issues.
407-409: Good guard for legacy-union testSkipping union-format test when variant_part is active avoids false failures.
numba_cuda/numba/cuda/lowering.py (1)
1692-1735: Polymorphic store bypasses parent decref—low risk today, but verify NRT remains disabledThe early return at line 1735 does skip the parent's
decref(fetype, old)call (line 1580 in Lower.storevar). This is harmless today because NRT defaults to disabled—decref becomes a no-op—but ifCUDA_ENABLE_NRTis ever toggled, old values would leak.NRT is disabled by default in CUDA targets via the
enable_nrtproperty, which returnsgetattr(config, "CUDA_ENABLE_NRT", False), and bothincrefanddecrefguard withif not self.context.enable_nrt: return. The bypass is therefore safe under current defaults.To defensively prevent future issues, consider guarding the early return with a check to ensure it only bypasses when NRT is disabled, and add a comment explaining why parent logic must run if NRT is ever enabled.
e521ac0 to
9fb6f3a
Compare
|
/ok to test 9927acb |
|
/ok to test f64dddc |
|
/ok to test 2388791 |
|
/ok to test d1f1dc3 |
|
/ok to test c8dc398 |
|
/ok to test 9932955 |
|
@gmarkall Could you please take another look at this PR? |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small nits on the diff. I'm going to merge main, fix these up and then run CI, because I think this will then be good to merge.
numba_cuda/numba/cuda/debuginfo.py
Outdated
| def _get_llvmlite_version(): | ||
| """Get llvmlite version as tuple (major, minor).""" | ||
| try: | ||
| import llvmlite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import can be at the top level - llvmlite will always be available in a working Numba-CUDA installation.
numba_cuda/numba/cuda/debuginfo.py
Outdated
| major = int(parts[0]) | ||
| minor = int(parts[1]) | ||
| return (major, minor) | ||
| except (ImportError, AttributeError, ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the import at the top level an ImportError should not be expected here. If the version splits into less than two parts (e.g. a single integer for some reason), we could also get an IndexError:
| except (ImportError, AttributeError, ValueError): | |
| except (IndexError, AttributeError, ValueError): |
|
/ok to test |
|
(Set to auto merge) |
- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)
- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478) <!-- Thank you for contributing to numba-cuda :) Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present) and replace it with `[REVIEW]`. If assistance is required to complete the functionality, for example when the C/C++ code of a feature is complete but Python bindings are still required, then add the label `[HELP-REQ]` so that others can triage and assist. The additional changes then can be implemented on top of the same PR. If the assistance is done by members of the rapidsAI team, then no additional actions are required by the creator of the original PR for this, otherwise the original author of the PR needs to give permission to the person(s) assisting to commit to their personal fork of the project. If that doesn't happen then a new PR based on the code of the original PR can be opened by the person assisting, which then will be the PR that will be merged. 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please do not rebase your branch on main/force push/rewrite history, doing any of these causes the context of any comments made by reviewers to be lost. If conflicts occur against main they should be resolved by merging main into the branch used for making the pull request. Many thanks in advance for your cooperation! -->
Overview
This PR implements proper DWARF debug information generation for polymorphic variables in Numba CUDA kernels using
DW_TAG_variant_part(discriminated unions). This enables debuggers like cuda-gdb to correctly inspect polymorphic variables that can hold different types throughout their lifetime.Key Changes
1. Enhanced Polymorphic Variable Lowering (
lowering.py)CUDA_DEBUG_POLYenabled:sizeof(fetype)[discriminant: 8 bits][padding][data: varies by type]2. DWARF Variant Part Metadata Generation (
debuginfo.py)DW_TAG_variant_partwith discriminator fieldextraDatafield containing discriminant valuesThe final DWARF structure: after this change would look like the following e.g.,
extraDataemission:i8 N) for newer llvmlite versions (> 0.45)CUDA_DEBUG_POLY_USE_TYPED_CONSTconfig flag3. Test Coverage (
test_debuginfo.py)test_poly_variant_part()to verify DWARF output:CUDA_DEBUG_POLYconfiguration flagCUDA_DEBUG_POLY_USE_TYPED_CONST4. Backward Compatibility
CUDA_DEBUG_POLY: Falls back to existing behavior, i.e. user mentally choose runtime type from a union like types wrapper in a debugging session.test_union_poly_types) preserved and conditionally skippedThis change implements issue#520.
Summary by CodeRabbit
New Features
Tests