-
Notifications
You must be signed in to change notification settings - Fork 54
Fix Issue #588: separate compilation of NVVM IR modules when generating debuginfo #591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
One test still fails, because the C ABI wrapper generator generates no debug info, and the separate compilation seems to lead NVVM to not generate a debug section for it. This should probably be addressed by generating debug info for the C ABI wrapper.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
Greptile OverviewGreptile SummaryThis PR fixes Issue #588 by implementing separate compilation of NVVM IR modules when generating debug information. The change reverts to pre-PR#8594 behavior specifically for debug mode compilation while maintaining the optimized combined compilation for lineinfo mode. The core modification adds a new This addresses NVVM's official requirement that debug compilation should only have a single debug compile unit, as combining multiple modules with debug info can produce invalid PTX with duplicate debug sections that the linker rejects. Important Files Changed
Confidence score: 2/5
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
- logic: missing `**options` parameter - `arch` and other compilation options won't be passed when compiling with debug info - syntax: missing space in error message Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
/ok to test 34c4fb1 |
|
/ok to test |
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, no comments
- Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)
- Capture global device arrays in kernels and device functions (#666) - Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (#643) - Fix Issue #588: separate compilation of NVVM IR modules when generating debuginfo (#591) - feat: allow printing nested tuples (#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (#655) - build(deps): bump actions/upload-artifact from 4 to 5 (#652) - Test RAPIDS 25.12 (#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (#662) - feat: add print support for int64 tuples (#663) - Only run dependabot monthly and open fewer PRs (#658) - test: fix bogus `self` argument to `Context` (#656) - Fix false negative NRT link decision when NRT was previously toggled on (#650) - Add support for dependabot (#647) - refactor: cull dead linker objects (#649) - Migrate numba-cuda driver to use cuda.core.launch API (#609) - feat: add set_shared_memory_carveout (#629) - chore: bump version in pixi.toml (#641) - refactor: remove devicearray code to reduce complexity (#600)
v0.23.0 - Capture global device arrays in kernels and device functions (NVIDIA#666) - Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)
One test still fails, because the C ABI wrapper generator generates no debug info, and the separate compilation seems to lead NVVM to not generate a debug section for it. This should probably be addressed by generating debug info for the C ABI wrapper.
Fixes #588.
Fixes NVBugs: 5196888, 5227483, 5639364.