Add support for cache-hinted load and store operations by gmarkall · Pull Request #51 · NVIDIA/numba-cuda

gmarkall · 2024-10-03T11:00:41Z

To-do:

Add documentation.
Test cases for erroneous arguments. It would be good to check for accidental use on shared or local arrays, but this may not be easy to do.
Add additional validations as described in comments in ld_cache_operator and st_cache_operator.
Refactor the implementation - the load and store implementations contain a lot of common code.
Decide on whether to support complex, and work out why it presently doesn't work.

gmarkall · 2024-10-22T21:11:09Z

A note: these also need to work for CPointer() types as well as arrays.

CLAassistant · 2025-08-20T02:00:29Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

gmarkall · 2025-11-07T12:23:06Z

@kaeun97 I started this a while back but never managed to address all the to-do items. Essentially the PR was aimed at adding new functions for cache-hinted load and store operations, but I didn't get the implementation and error handling to be cleaner than the current "prototype" like form. It might be interesting as it is another example of adding new code generation to Numba-CUDA.

What do you think of the PR? Would it be of any interest to you to take it on and complete the to-do list items?

kaeun97 · 2025-11-07T16:40:29Z

@gmarkall Happy to continue the work :) Thank you!

gmarkall · 2025-11-07T17:17:25Z

@kaeun97 Thanks! Unfortunately I don't think I have a way to give you permissions to push to this branch / PR, but please feel to open a new PR for the continuation; I'll close this one at that point.

cpcloud

Looks good. TIL about these instructions.

cpcloud · 2025-11-10T18:00:03Z

numba_cuda/numba/cuda/cache_hints.py

+            msg = f"Expected {array.ndim} indices, got {index.count}"
+            raise NumbaTypeError(msg)
+
+        if all([isinstance(t, types.Integer) for t in index.dtype]):


Suggested change

if all([isinstance(t, types.Integer) for t in index.dtype]):

if all(isinstance(t, types.Integer) for t in index.dtype):

No reason to create a list if you don't need to.

Will apply this change in the follow up PR :)

cpcloud · 2025-11-10T18:02:21Z

numba_cuda/numba/cuda/cache_hints.py

+    def impl(array, i):
+        return ldca_intrinsic(array, i)
+    return impl


Can't this just be:

Suggested change

def impl(array, i):

return ldca_intrinsic(array, i)

return impl

return lcda_intrinsic

for each of these?

An overload should return a Python function that gets jit-compiled to become the implementation of the function it's overloading. ldca_intrinsic is an intrinsic, called from within the compiled function.

I suspect it will not work to return an intrinsic as an overload implementation, but even if it did, it would feel jarring to me to contract a level of abstraction in the implementation here.

It does seem really weird to my eye that a function foo(*args) whose only line is to call bar(*args) cannot simply be replaced with the call to bar(*args).

This feels like a numbaism perhaps that violates basic substitution rules, but this is of course not a blocking comment.

I think the confusion here comes from reading the code as if it's going to be interpreted by the Python interpreter, rather than seeing it as a form of metaprogramming, which is what functions decorated with @overload implement.

An attempt to summarise the pertinent points:

An @overload function returns a Python function that Numba-CUDA compiles.

An @intrinsic function (like ldca_intrinsic, generated by ld_cache_operator() above), is a function that returns a tuple of (signature, codegen), where:

signature is the typing signature that Numba-CUDA uses during type inference to determine and validate the function's argument and return types, and

codegen is a function that Numba-CUDA calls to generate the LLVM IR for the implementation.

During the compilation process (when impl() is being compiled), the typing and lowering for intrinsics are resolved, and the implementation of the intrinsic generated by the codegen() function is inserted into the generated code.

In the compilation process for impl(), type inference and code generation implements ldca_intrinsic() as a function that returns a scalar value, and accepts and array and index as arguments. The typing is defined by the signature on line 96, and the code generation function follows below it.

Therefore, if you replace return impl with return ldca_intrinsic in ol_ldca(), you have replaced a function that accepts an array and an index then returns a scalar (impl(array, i) -> array.dtype), with one that accepts a typing context and a number of LLVM IR values, and returns a signature and code generation function (impl(typingctx, array, index) -> (Signature, Function)).

I hope this clarifies things a bit, but for a more complete understanding I can't see a shortcut that avoids working through the low- and high-level extension API documentation for Numba:

Low-level API

High-level API

That said, I do have a couple of worked examples that show the whole flow in one notebook for each of these APIs for the CUDA target, which may also help:

Extending Numba-CUDA with the Low-level API

Extending Numba-CUDA with the High-level API

They may be a little out-of-date and need a couple of bits updating, but the general flow of them is still relevant.

We're using the High-level API here (that @overload and @intrinsic are part of), but it's probably hard to understand the High-level API without first understanding the Low-level API. The High-level API is intended to make it quicker and easier to write Numba extensions, but in my view the main thing it provides is some shorthand for a lot of Low-level API work.

Finally - what happens if you implement the suggested change (modulo the typo in the name of the function above)? You will get:

AssertionError: Implementation function returned by `@overload` has an unexpected type. Got <intrinsic impl>

cc @kaeun97 as the explanation might be helpful in understanding the PR as a whole.

gmarkall · 2025-11-11T09:20:35Z

Closing this as #587 supersedes it.

gmarkall added 3 commits October 3, 2024 11:47

Initial strart of implementation

2903172

Add note on docs

29935f6

Start of tests

c479585

gmarkall added the 2 - In Progress Currently a work in progress label Oct 3, 2024

gmarkall added 2 commits October 7, 2024 12:22

Use global constraint map

cb35c60

Validate indices

c643a92

gmarkall added this to the v0.0.19 milestone Oct 21, 2024

gmarkall modified the milestones: v0.0.20, v0.0.21, v0.0.22 Dec 4, 2024

gmarkall modified the milestones: v0.3.0, v0.4.0 Jan 2, 2025

gmarkall modified the milestones: v0.4.0, v0.5.0 Jan 27, 2025

kkraus14 mentioned this pull request Apr 14, 2025

[FEA] Add load / store operations with cache hints #215

Closed

rparolin removed this from the v0.21.0 milestone Oct 9, 2025

cpcloud reviewed Nov 10, 2025

View reviewed changes

kaeun97 mentioned this pull request Nov 10, 2025

feat: add support for cache-hinted load and store operations #587

Merged

10 tasks

gmarkall closed this Nov 11, 2025

	if all([isinstance(t, types.Integer) for t in index.dtype]):
	if all(isinstance(t, types.Integer) for t in index.dtype):

Conversation

gmarkall commented Oct 3, 2024

Uh oh!

gmarkall commented Oct 22, 2024

Uh oh!

CLAassistant commented Aug 20, 2025

Uh oh!

gmarkall commented Nov 7, 2025

Uh oh!

kaeun97 commented Nov 7, 2025

Uh oh!

gmarkall commented Nov 7, 2025

Uh oh!

cpcloud left a comment

Choose a reason for hiding this comment

Uh oh!

cpcloud Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

kaeun97 Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

cpcloud Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

cpcloud Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants