Add support for cache-hinted load and store operations#51
Add support for cache-hinted load and store operations#51gmarkall wants to merge 5 commits intoNVIDIA:mainfrom
Conversation
|
A note: these also need to work for |
|
|
|
@kaeun97 I started this a while back but never managed to address all the to-do items. Essentially the PR was aimed at adding new functions for cache-hinted load and store operations, but I didn't get the implementation and error handling to be cleaner than the current "prototype" like form. It might be interesting as it is another example of adding new code generation to Numba-CUDA. What do you think of the PR? Would it be of any interest to you to take it on and complete the to-do list items? |
|
@gmarkall Happy to continue the work :) Thank you! |
|
@kaeun97 Thanks! Unfortunately I don't think I have a way to give you permissions to push to this branch / PR, but please feel to open a new PR for the continuation; I'll close this one at that point. |
cpcloud
left a comment
There was a problem hiding this comment.
Looks good. TIL about these instructions.
| msg = f"Expected {array.ndim} indices, got {index.count}" | ||
| raise NumbaTypeError(msg) | ||
|
|
||
| if all([isinstance(t, types.Integer) for t in index.dtype]): |
There was a problem hiding this comment.
| if all([isinstance(t, types.Integer) for t in index.dtype]): | |
| if all(isinstance(t, types.Integer) for t in index.dtype): |
No reason to create a list if you don't need to.
There was a problem hiding this comment.
Will apply this change in the follow up PR :)
| def impl(array, i): | ||
| return ldca_intrinsic(array, i) | ||
| return impl |
There was a problem hiding this comment.
Can't this just be:
| def impl(array, i): | |
| return ldca_intrinsic(array, i) | |
| return impl | |
| return lcda_intrinsic |
for each of these?
There was a problem hiding this comment.
An overload should return a Python function that gets jit-compiled to become the implementation of the function it's overloading. ldca_intrinsic is an intrinsic, called from within the compiled function.
I suspect it will not work to return an intrinsic as an overload implementation, but even if it did, it would feel jarring to me to contract a level of abstraction in the implementation here.
There was a problem hiding this comment.
It does seem really weird to my eye that a function foo(*args) whose only line is to call bar(*args) cannot simply be replaced with the call to bar(*args).
This feels like a numbaism perhaps that violates basic substitution rules, but this is of course not a blocking comment.
There was a problem hiding this comment.
I think the confusion here comes from reading the code as if it's going to be interpreted by the Python interpreter, rather than seeing it as a form of metaprogramming, which is what functions decorated with @overload implement.
An attempt to summarise the pertinent points:
- An
@overloadfunction returns a Python function that Numba-CUDA compiles. - An
@intrinsicfunction (likeldca_intrinsic, generated byld_cache_operator()above), is a function that returns a tuple of(signature, codegen), where:signatureis the typing signature that Numba-CUDA uses during type inference to determine and validate the function's argument and return types, andcodegenis a function that Numba-CUDA calls to generate the LLVM IR for the implementation.
- During the compilation process (when
impl()is being compiled), the typing and lowering for intrinsics are resolved, and the implementation of the intrinsic generated by thecodegen()function is inserted into the generated code. - In the compilation process for
impl(), type inference and code generation implementsldca_intrinsic()as a function that returns a scalar value, and accepts and array and index as arguments. The typing is defined by the signature on line 96, and the code generation function follows below it. - Therefore, if you replace
return implwithreturn ldca_intrinsicinol_ldca(), you have replaced a function that accepts an array and an index then returns a scalar (impl(array, i) -> array.dtype), with one that accepts a typing context and a number of LLVM IR values, and returns a signature and code generation function (impl(typingctx, array, index) -> (Signature, Function)).
I hope this clarifies things a bit, but for a more complete understanding I can't see a shortcut that avoids working through the low- and high-level extension API documentation for Numba:
That said, I do have a couple of worked examples that show the whole flow in one notebook for each of these APIs for the CUDA target, which may also help:
They may be a little out-of-date and need a couple of bits updating, but the general flow of them is still relevant.
We're using the High-level API here (that @overload and @intrinsic are part of), but it's probably hard to understand the High-level API without first understanding the Low-level API. The High-level API is intended to make it quicker and easier to write Numba extensions, but in my view the main thing it provides is some shorthand for a lot of Low-level API work.
Finally - what happens if you implement the suggested change (modulo the typo in the name of the function above)? You will get:
AssertionError: Implementation function returned by `@overload` has an unexpected type. Got <intrinsic impl>
cc @kaeun97 as the explanation might be helpful in understanding the PR as a whole.
|
Closing this as #587 supersedes it. |
To-do:
ld_cache_operatorandst_cache_operator.