Skip to content

Conversation

@gmarkall
Copy link
Contributor

@gmarkall gmarkall commented Nov 17, 2025

One test still fails, because the C ABI wrapper generator generates no debug info, and the separate compilation seems to lead NVVM to not generate a debug section for it. This should probably be addressed by generating debug info for the C ABI wrapper.

Fixes #588.
Fixes NVBugs: 5196888, 5227483, 5639364.

One test still fails, because the C ABI wrapper generator generates no
debug info, and the separate compilation seems to lead NVVM to not
generate a debug section for it. This should probably be addressed by
generating debug info for the C ABI wrapper.
@gmarkall gmarkall added the 2 - In Progress Currently a work in progress label Nov 17, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 17, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@gmarkall
Copy link
Contributor Author

/ok to test

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 20, 2025

Greptile Overview

Greptile Summary

This PR fixes Issue #588 by implementing separate compilation of NVVM IR modules when generating debug information. The change reverts to pre-PR#8594 behavior specifically for debug mode compilation while maintaining the optimized combined compilation for lineinfo mode.

The core modification adds a new get_asm_strs() method that returns a list of PTX strings instead of a single concatenated string. When the "g" debug option is present, each IR module compiles separately to produce one PTX file per module. For lineinfo mode (no "g" option), the existing combined compilation approach is preserved for performance. The linkage type also changes from linkonce_odr to weak_odr in debug mode to handle symbol resolution across separately compiled modules.

This addresses NVVM's official requirement that debug compilation should only have a single debug compile unit, as combining multiple modules with debug info can produce invalid PTX with duplicate debug sections that the linker rejects.

Important Files Changed

Filename Score Overview
numba_cuda/numba/cuda/codegen.py 2/5 Refactors PTX compilation to support separate compilation in debug mode but contains critical bug missing **options parameter on line 225
numba_cuda/numba/cuda/compiler.py 4/5 Updates compilation process to handle multiple PTX codes using new get_asm_strs() method with proper error handling
numba_cuda/numba/cuda/tests/cudapy/test_compiler.py 4/5 Marks test as expected failure with detailed documentation explaining why C ABI wrapper debug info is no longer generated

Confidence score: 2/5

  • This PR contains a critical bug that will likely cause compilation failures in debug mode
  • Score reflects the missing **options parameter in codegen.py line 225 which will drop architecture and fastmath options during separate IR compilation
  • Pay close attention to numba_cuda/numba/cuda/codegen.py - the bug on line 225 needs to be fixed before merging

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

gmarkall and others added 2 commits November 20, 2025 13:19
- logic: missing `**options` parameter - `arch` and other compilation options won't be passed when compiling with debug info
- syntax: missing space in error message

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@brandon-b-miller
Copy link
Contributor

/ok to test 34c4fb1

@gmarkall gmarkall added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Dec 16, 2025
@gmarkall
Copy link
Contributor Author

/ok to test

@gmarkall gmarkall marked this pull request as ready for review December 16, 2025 20:47
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 16, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@gmarkall gmarkall changed the title [WIP] Fix Issue #588: separate compilation of NVVM IR modules when generating debuginfo Fix Issue #588: separate compilation of NVVM IR modules when generating debuginfo Dec 16, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@brandon-b-miller brandon-b-miller added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Dec 16, 2025
@brandon-b-miller brandon-b-miller merged commit dd396c8 into NVIDIA:main Dec 16, 2025
71 checks passed
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Dec 17, 2025
- Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643)
- Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591)
- feat: allow printing nested tuples (NVIDIA#667)
- build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655)
- build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652)
- Test RAPIDS 25.12 (NVIDIA#661)
- Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662)
- feat: add print support for int64 tuples (NVIDIA#663)
- Only run dependabot monthly and open fewer PRs (NVIDIA#658)
- test: fix bogus `self` argument to `Context` (NVIDIA#656)
- Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650)
- Add support for dependabot (NVIDIA#647)
- refactor: cull dead linker objects (NVIDIA#649)
- Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609)
- feat: add set_shared_memory_carveout (NVIDIA#629)
- chore: bump version in pixi.toml (NVIDIA#641)
- refactor: remove devicearray code to reduce complexity (NVIDIA#600)
@gmarkall gmarkall mentioned this pull request Dec 17, 2025
gmarkall added a commit that referenced this pull request Dec 17, 2025
- Capture global device arrays in kernels and device functions (#666)
- Fix #624: Accept Numba IR
nodes in all places Numba-CUDA IR nodes are expected
(#643)
- Fix Issue #588: separate
compilation of NVVM IR modules when generating debuginfo
(#591)
- feat: allow printing nested tuples
(#667)
- build(deps): bump actions/setup-python from 5.6.0 to 6.1.0
(#655)
- build(deps): bump actions/upload-artifact from 4 to 5
(#652)
- Test RAPIDS 25.12 (#661)
- Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests
(#662)
- feat: add print support for int64 tuples
(#663)
- Only run dependabot monthly and open fewer PRs
(#658)
- test: fix bogus `self` argument to `Context`
(#656)
- Fix false negative NRT link decision when NRT was previously toggled
on (#650)
- Add support for dependabot
(#647)
- refactor: cull dead linker objects
(#649)
- Migrate numba-cuda driver to use cuda.core.launch API
(#609)
- feat: add set_shared_memory_carveout
(#629)
- chore: bump version in pixi.toml
(#641)
- refactor: remove devicearray code to reduce complexity
(#600)
ZzEeKkAa added a commit to ZzEeKkAa/numba-cuda that referenced this pull request Jan 8, 2026
v0.23.0

- Capture global device arrays in kernels and device functions (NVIDIA#666)
- Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643)
- Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591)
- feat: allow printing nested tuples (NVIDIA#667)
- build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655)
- build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652)
- Test RAPIDS 25.12 (NVIDIA#661)
- Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662)
- feat: add print support for int64 tuples (NVIDIA#663)
- Only run dependabot monthly and open fewer PRs (NVIDIA#658)
- test: fix bogus `self` argument to `Context` (NVIDIA#656)
- Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650)
- Add support for dependabot (NVIDIA#647)
- refactor: cull dead linker objects (NVIDIA#649)
- Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609)
- feat: add set_shared_memory_carveout (NVIDIA#629)
- chore: bump version in pixi.toml (NVIDIA#641)
- refactor: remove devicearray code to reduce complexity (NVIDIA#600)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to merge Testing and reviews complete, ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Compilation for debug requires separate translation of each NVVM IR module

2 participants