Skip to content

Add DWARF address class support for shared memory arrays#594

Merged
gmarkall merged 10 commits intoNVIDIA:mainfrom
jiel-nv:dwarf-address-class
Dec 2, 2025
Merged

Add DWARF address class support for shared memory arrays#594
gmarkall merged 10 commits intoNVIDIA:mainfrom
jiel-nv:dwarf-address-class

Conversation

@jiel-nv
Copy link
Contributor

@jiel-nv jiel-nv commented Nov 18, 2025

This change adds "dwarfAddressSpace" attribute to debug metadata for CUDA shared memory pointers, enabling debuggers to correctly identify memory location of variables.

I choose to add address space tracking in the lowering phase, rather than modifying the underlying typing infrastructure (ArrayModel, PointerModel) due to the following reasons:

  1. There is an onging effort decoupling from Numba's typing system, but the default behavior is still redirect to Numba;
  2. There is a WIP PR#236 introducing CUDAArray type and implementation with addresspace information.

When either of the above is completed, there will be a cleaner approach to update this patch.

So in this change,

  1. Add detection in CUDALower Numba ir.Call to find cuda.shared.array() call; set flag for the subsequent storevar() to record the name / addrespace mapping; later reference the address space map when emitting debug info.
  2. A mapping from NVVM address space to DWARF address class is added in order to emit the "dwarfAddressSpace" to the DIDerivedType for pointer member "data" from the CUDA array descriptor.
  3. A new test is added to make sure shared array and regular local array get distinguished.

This fixes nvbug#5643016.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 18, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@jiel-nv
Copy link
Contributor Author

jiel-nv commented Nov 18, 2025

/ok to test 4b90a71

@jiel-nv
Copy link
Contributor Author

jiel-nv commented Nov 18, 2025

/ok to test 4b29d34

@jiel-nv jiel-nv added the 2 - In Progress Currently a work in progress label Nov 18, 2025
@jiel-nv jiel-nv changed the title Add DWARF Address Class Support for Shared Memory Arrays [WIP] Add DWARF Address Class Support for Shared Memory Arrays Nov 18, 2025
@jiel-nv jiel-nv marked this pull request as draft November 18, 2025 16:30
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 18, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@jiel-nv
Copy link
Contributor Author

jiel-nv commented Nov 18, 2025

/ok to test 78332c7

@jiel-nv
Copy link
Contributor Author

jiel-nv commented Nov 20, 2025

/ok to test f5ffd5a

@jiel-nv
Copy link
Contributor Author

jiel-nv commented Nov 20, 2025

/ok to test 062e64a

@jiel-nv jiel-nv marked this pull request as ready for review November 20, 2025 01:31
@jiel-nv jiel-nv requested a review from gmarkall November 20, 2025 01:40
@jiel-nv jiel-nv added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Nov 20, 2025
@gmarkall
Copy link
Contributor

@coderabbitai review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 20, 2025

Greptile Overview

Greptile Summary

This PR adds DWARF address space debug metadata for CUDA shared memory arrays, enabling debuggers to correctly identify where variables are stored. The implementation tracks cuda.shared.array() calls during lowering and annotates the resulting pointer debug metadata with dwarfAddressSpace: 8.

Key changes:

  • Added DwarfAddressClass enum mapping NVVM address spaces to DWARF address classes
  • Modified CUDALower to detect cuda.shared.array() calls and track variable address spaces
  • Enhanced CUDADIBuilder._var_type() to emit dwarfAddressSpace attribute for pointer members
  • Added comprehensive tests verifying shared arrays get address class 8 while local arrays don't

The implementation deliberately tracks only shared arrays as a temporary solution until the WIP CUDAArray type (PR #236) provides native address space support in the type system.

Confidence Score: 4/5

  • This PR is safe to merge with minor considerations about incomplete coverage
  • The implementation is correct and well-tested for shared memory arrays. The score reflects that only cuda.shared.array() is tracked while cuda.local.array() could also benefit from address space tracking (LOCAL maps to DWARF class 0x06), as noted in previous review comments. The approach is explicitly acknowledged as temporary pending PR [WIP] Add CUDAArray type and implementation with addresspace information #236.
  • No files require special attention - implementation is clean and focused

Important Files Changed

File Analysis

Filename Score Overview
numba_cuda/numba/cuda/lowering.py 4/5 Added tracking for cuda.shared.array() calls to record address space in _addrspace_map for debug info
numba_cuda/numba/cuda/debuginfo.py 5/5 Added DwarfAddressClass enum, address space mapping logic, and DWARF metadata emission for pointer types
numba_cuda/numba/cuda/tests/cudapy/test_debuginfo.py 5/5 Added comprehensive tests verifying shared arrays get dwarfAddressSpace: 8 while local arrays don't

Sequence Diagram

sequenceDiagram
    participant User as CUDA Kernel Code
    participant Lower as CUDALower
    participant DI as CUDADIBuilder
    participant LLVM as LLVM Debug Metadata

    User->>Lower: cuda.shared.array(32, dtype)
    Lower->>Lower: _lower_call_normal()
    Lower->>Lower: _is_shared_array_call() checks typing_key
    Lower->>Lower: Set _pending_shared_store = True
    
    Lower->>Lower: storevar(value, "shared_arr")
    Lower->>Lower: _addrspace_map["shared_arr"] = ADDRSPACE_SHARED (3)
    Lower->>Lower: Reset _pending_shared_store = False
    
    Lower->>DI: _set_addrspace_map(_addrspace_map)
    DI->>DI: Store _var_addrspace_map
    
    DI->>DI: mark_variable("shared_arr", ...)
    DI->>DI: _addrspace = _var_addrspace_map.get("shared_arr") = 3
    
    DI->>DI: _var_type() with _addrspace=3
    DI->>DI: Check if struct with datamodel and addrspace != 0
    DI->>DI: Find "data" field (pointer type)
    DI->>DI: get_dwarf_address_class(3) -> DwarfAddressClass.SHARED (8)
    
    DI->>LLVM: Create DIDerivedType with dwarfAddressSpace: 8
    LLVM-->>User: Debug metadata enables debugger to identify shared memory
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@jiel-nv jiel-nv changed the title [WIP] Add DWARF Address Class Support for Shared Memory Arrays Add DWARF address class support for shared memory arrays Nov 20, 2025
@jiel-nv
Copy link
Contributor Author

jiel-nv commented Nov 21, 2025

@greptileai The only detecting shared arrays, not local arrays (and other local storage variables) was intentional. It preserves the existing behavior, i.e. the back end (libnvvm) handles and emits DWARF address class for the default LOCAL stack storage (0x06) and some of the variables will be pomoted to the REGISTER storage (0x02). So this PR makes the minimal change upon existing behaviors by adding debug support for shared memory.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand - you're making a targeted improvement to debug support specifically for shared memory arrays, while intentionally preserving the existing behavior for local arrays and other variables. This is a sensible approach for several reasons:

  1. Minimal disruption: By only handling shared memory explicitly, you avoid changing the established debug metadata generation for local variables that already works correctly through libnvvm.

  2. Backend delegation: You're letting the NVVM backend continue handling the address space classification for local storage (0x06) and register promotion (0x02), which it already does well.

  3. Focused fix: The PR addresses the specific gap in shared memory debug support (nvbug#5643016) without over-engineering or risking regressions in other areas.

Your implementation correctly detects cuda.shared.array() calls in the lowering phase and tracks the address space mapping only for those variables, leaving local arrays to be handled by the existing infrastructure. The test confirms this behavior by verifying that shared arrays get dwarfAddressSpace: 8 while local arrays don't have this explicit annotation (delegating to backend handling).

This is a well-reasoned approach that balances fixing the immediate issue with maintaining stability of the existing debug infrastructure.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@jiel-nv
Copy link
Contributor Author

jiel-nv commented Nov 26, 2025

/ok to test 12354ed

@gmarkall
Copy link
Contributor

gmarkall commented Dec 2, 2025

/ok to test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Test with additional types to ensure that the code generation is OK
with:

- Scalars
- Struct models (the complex type)
- Records
@gmarkall
Copy link
Contributor

gmarkall commented Dec 2, 2025

/ok to test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK to merge as-is; I do think it couples together things that would be better decoupled, through the shared state between lowering and debuginfo, the "pending shared store" state of lowering, and the "current address space" of the DIBuilder.

I would like to have a try and decoupling some of these things, but as I'm not familiar enough with the thinking around these changes, I'm not sure I can get it right without collapsing the abstractions that you've built such that it's harder to understand from the perspective of what's going on in the bigger picture - I'll merge this, then post a follow-up PR with my attempts, for your feedback.

@gmarkall gmarkall merged commit 9e0a986 into NVIDIA:main Dec 2, 2025
71 checks passed
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Dec 2, 2025
- Add DWARF address class support for shared memory arrays (NVIDIA#594)
@gmarkall gmarkall mentioned this pull request Dec 2, 2025
gmarkall added a commit that referenced this pull request Dec 3, 2025
- Add DWARF address class support for shared memory arrays (#594)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants