Fix linking of external code from callees #137

gmarkall · 2025-02-26T13:36:26Z

The original linking implementation for linkable code in device declarations did not consider calls inside callees; this change recurses through the typing to find all calls requiring linkable code.

isVoid · 2025-03-02T00:30:49Z

numba_cuda/numba/cuda/dispatcher.py

+    # The typemap of the function includes calls, so we can traverse it to find
+    # the references we need.
+    for name, v in cres.fndesc.typemap.items():
+
+        # CUDADispatchers represent a call to a device function, so we need to
+        # look up the linkable code for those recursively.
+        if isinstance(v, cuda_types.CUDADispatcher):
+            # We need to locate the signature of the call so we can find the
+            # correct overload.
+            for call, sig in cres.fndesc.calltypes.items():
+                if isinstance(call, ir.Expr) and call.op == 'call':
+                    # There will likely be multiple calls in the typemap; we
+                    # can uniquely identify the relevant one using its SSA
+                    # name.
+                    if call.func.name == name:
+                        called_cres = v.dispatcher.overloads[sig.args]
+                        called_link_objects = get_cres_link_objects(called_cres)
+                        link_objects.update(called_link_objects)


This is cool. I learnt a few things by reading through this section. Do you think the below simplifies the code and reduces the code complexity for a little bit?

I made a PR here:
gmarkall#4

I think this reduces the size of the list for both of the nested for-loop. This is proportional to O(num_calls^2), not O(num_typings^2)

Thanks - I've incorporated your changes. I don't think there's much of a performance impact (the number of calls and typings won't be that large), but I think your modifications improved the readability of the code.

isVoid

lgtm

- Fix linking of external code from callees (NVIDIA#137) - Try using a newer branch workflow (NVIDIA#148) - Move publish step out of `wheels-build.yaml` (NVIDIA#147) - Upload wheels to PyPI from GitHub-hosted runner (NVIDIA#142) - Add paddle to interoperability chapter (NVIDIA#144) - Fix the debug info of GridGroup type (NVIDIA#131) - Remove dead `prepare_cuda_kernel()` (NVIDIA#130) - Add a CUDA DI Builder (NVIDIA#104) - dont launch extra kernels when stats counting is disabled (NVIDIA#127) - Fixup debug metadata in kernel fixup (NVIDIA#97) - Implement debuginfo bool name fix (numba/numba#9888) in numba-cuda (NVIDIA#106)

- Fix linking of external code from callees (#137) - Try using a newer branch workflow (#148) - Move publish step out of `wheels-build.yaml` (#147) - Upload wheels to PyPI from GitHub-hosted runner (#142) - Add paddle to interoperability chapter (#144) - Fix the debug info of GridGroup type (#131) - Remove dead `prepare_cuda_kernel()` (#130) - Add a CUDA DI Builder (#104) - dont launch extra kernels when stats counting is disabled (#127) - Fixup debug metadata in kernel fixup (#97) - Implement debuginfo bool name fix (numba/numba#9888) in numba-cuda (#106)

gmarkall added 2 commits February 26, 2025 13:26

Fix linking of external code used from callees

7021360

The original linking implementation for linkable code in device declarations did not consider calls inside callees; this change recurses through the typing to find all calls requiring linkable code.

Add some comments and docs to get_cres_link_objects()

2722532

gmarkall added the 2 - In Progress Currently a work in progress label Feb 26, 2025

isVoid reviewed Mar 2, 2025

View reviewed changes

isVoid and others added 2 commits March 2, 2025 17:57

attempt to improve efficiency

0a1d4f8

Add a small comment

96af878

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author 2 - In Progress Currently a work in progress and removed 2 - In Progress Currently a work in progress 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Mar 5, 2025

s/mambabuild/build/

e7aa940

isVoid approved these changes Mar 6, 2025

View reviewed changes

gmarkall merged commit 9f0d154 into NVIDIA:main Mar 6, 2025
31 checks passed

gmarkall mentioned this pull request Mar 6, 2025

Bump version to 0.6.0 #149

Merged

gmarkall mentioned this pull request Mar 17, 2025

[FEA] Support automatic generation of link list for external code #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix linking of external code from callees #137

Fix linking of external code from callees #137

Uh oh!

gmarkall commented Feb 26, 2025

Uh oh!

isVoid Mar 2, 2025 •

edited

Loading

Uh oh!

gmarkall Mar 5, 2025

Uh oh!

isVoid left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix linking of external code from callees #137

Fix linking of external code from callees #137

Uh oh!

Conversation

gmarkall commented Feb 26, 2025

Uh oh!

isVoid Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmarkall Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

isVoid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

isVoid Mar 2, 2025 •

edited

Loading