-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat 568 triple chevron #592
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! There are a few minor comments below. Apart from that, not all of the #568 requests are addressed. First of all, Thrust implementation is still present (instead of relying on alias). Secondly, there's no test of triple chevron.
cub/cub/device/device_copy.cuh
Outdated
@@ -42,8 +42,7 @@ _CCCL_IMPLICIT_SYSTEM_HEADER | |||
|
|||
#include <cub/device/dispatch/dispatch_batch_memcpy.cuh> | |||
|
|||
#include <thrust/system/cuda/detail/core/triple_chevron_launch.h> | |||
|
|||
#include <cub/detail/triple_chevron_launch.cuh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I don't think this include is needed by this header. Please, remove it.
Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: Georgy Evtushenko <[email protected]>
… feat-568-triple-chevron
cudaMemcpy(&result, d_result, sizeof(double), cudaMemcpyDeviceToHost); | ||
|
||
REQUIRE(result == 8.5f); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: please, add a new line at the end of the file
*out = a + b; | ||
} | ||
|
||
CUB_TEST("CDP wrapper works with custom invocables and cdp_launch, on both host and device", "[test][utils]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I don't think this test is related to testing utilities. I'd write something along the following lines:
CUB_TEST("CDP wrapper works with custom invocables and cdp_launch, on both host and device", "[test][utils]") | |
CUB_TEST("CDP wrapper works with custom invocables and cdp_launch, on both host and device", "[device][triple_chevron]") |
} | ||
|
||
|
||
struct cdp_chevron_invoker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I don't think this struct is required. You should be able to put everything from the invoke
function inside the cdp_invocable::operator()
.
|
||
cudaMalloc(&d_result, sizeof(double)); | ||
auto chev = cub::detail::triple_chevron(1, 1); | ||
chev.doit(add_kernel, 5, 3.5f, d_result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: we have different logic for hots and device launch. I'd suggest using cdp helper here as well.
Co-authored-by: Georgii Evtushenko <[email protected]>
Co-authored-by: Georgii Evtushenko <[email protected]>
This ports
triple_chevron_launch
from thrust to CUB. As things stand it makes more sense to decouple the two. However, this PR does not touch the thrust repo. I can do that in this PR or in a follow up PR if it is desired. I thought it would be better to keep the PR smaller.Currently all tests pass on my 3080 sm86 except for one,
test_allocator.cu
forutil_allocator.cuh
. However it is important to note that this happens on main as well, so it is likely irrelevant to my changes. If it matters, only line 130 oftest_allocator.cu
assertion fails. If that one is commented out, all tests pass.Please pay attention to the differences in macros in
triple_chevron_launch.cuh
- I tried to follow what seemed to be idiomatic for CUB.