Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 2d3f647 |
|
/ok to test 2d3f647 |
|
|
||
| @cache_with_key(make_cache_key) | ||
| @cache_with_key(_make_cache_key) | ||
| def _make_merge_sort_cached( |
There was a problem hiding this comment.
Note to reviewers: this approach does introduce a layer of indirection here in the caching.
I do have some ideas for unifying caching across all algorithms which should simplify this, but I'll do that in a subsequent PR.
NaderAlAwar
left a comment
There was a problem hiding this comment.
Looks good, left a few comments. We should also be careful not to introduce too much overhead to the single phase API
python/cuda_cccl/cuda/compute/op.py
Outdated
| self._kind = kind | ||
|
|
||
| def get_cache_key(self) -> Hashable: | ||
| return (self.__class__.__name__, self._kind.name, self._kind.value) |
There was a problem hiding this comment.
Question: prior to this change, we only return (op.name, op.value). Why do we need to include self.__class__.__name__?
python/cuda_cccl/cuda/compute/op.py
Outdated
| self._cachable = CachableFunction(func) | ||
|
|
||
| def get_cache_key(self) -> Hashable: | ||
| return (self.__class__.__name__, self._cachable) |
There was a problem hiding this comment.
Same question as above, why do we need self.__class__.__name__?
| self.d_in_items_cccl = cccl.to_cccl_input_iter(d_in_items) | ||
| self.d_out_keys_cccl = cccl.to_cccl_output_iter(d_out_keys) | ||
| self.d_out_items_cccl = cccl.to_cccl_output_iter(d_out_items) | ||
| self.op_adapter = op |
There was a problem hiding this comment.
Question: why do we store op_adapter as a member variable? This also applies to other algorithms
There was a problem hiding this comment.
When we do introduce stateful operators, the op_adapter is what will hold the state arrays. That being said, let me remove this change from this PR and introduce it (or something else) in the subsequent PR.
| self.d_in_cccl = cccl.to_cccl_input_iter(d_in) | ||
| self.d_out_cccl = cccl.to_cccl_output_iter(d_out) | ||
| self.h_init_cccl = cccl.to_cccl_value(h_init) | ||
| self.op = op |
There was a problem hiding this comment.
Important: this is named op_adapter in merge_sort. We should use consistent names.
There was a problem hiding this comment.
I'll use op everywhere.
| d_out: DeviceArrayLike | IteratorBase, | ||
| d_num_selected_out: DeviceArrayLike, | ||
| cond: Callable, | ||
| cond: Callable | OpAdapter, # Raw callable or Operator |
There was a problem hiding this comment.
Question: it is not clear to me why this is annotated differently than the other algorithms? Also the comment seems unnecessary
This comment has been minimized.
This comment has been minimized.
|
/ok to test 557b6a0 |
🥳 CI Workflow Results🟩 Finished in 1h 28m: Pass: 100%/48 | Total: 11h 31m | Max: 44m 25sSee results here. |
Description
This PR is a refactor, in preparation for supporting stateful ops in
cuda.compute.The problem(s)
Duplicated logic for handling
OpKindv/s callablesCurrently, user-defined operations (e.g., predicates, transformations, etc.,) can be provided either as "built-in" ops (
OpKind) or custom functions (callables). These differ in the way they are (1) cached and (2) "compiled" into LTOIR. To handle these differences, we currently have a bunch ofif-elsestatements everywhere, like:and:
Determining signatures from annotations
If provided, type annotations offer a faster way to determine the return type of a user-defined callable, compared to using numba type inference. We take advantage of this in., e.g.,
transform, but it would be nice to do this for all ops. Ideally, we don't want to repeat the logic everywhere.Solution
This PR solves the above by introducing an (internal)
OpAdaptortype that encapsulates the logic for caching, signature determination, and compiling. Furthermore, it will make adding support for stateless ops much easier.Checklist