-
Notifications
You must be signed in to change notification settings - Fork 797
[SYCL][Reduction] Optimize "FastReduce" implementation for the discrete GPU case #6425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| auto getNoInitWriteAcc(size_t Size, handler &CGH) { | ||
| MOutBufPtr = std::make_shared<buffer<T, buffer_dim>>(range<1>(Size)); | ||
| CGH.addReduction(MOutBufPtr); | ||
| return MOutBufPtr->template get_access<access::mode::discard_write>(CGH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"discard_write" is deprecated in SYCL 2020, please use "write"
v-klochkov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, but
-
requires answer to question about -fno-unnamed-lambda.
-
The dynamic check of UseExtraKernelForInit and 2 calls of 'Rest' causes the code-duplication (2 versions of device code). Is that right? Does it worth the benefit coming from a bit faster initialization of Groups-Counter?
-
The code and PR description need more comments explaining why one way is better than the other.
Otherwise, sometime later someone may want to unify the code deleting the device code-duplication
|
|
||
| bool UseExtraKernelForInit = | ||
| NWorkGroups > 1 && !sycl::detail::getDeviceFromHandler(CGH) | ||
| .get_info<info::device::host_unified_memory>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"info::device::host_unified_memory" is deprecated in SYCL 2020.
| std::shared_ptr<detail::queue_impl> QueueCopy = CGH.MQueue; | ||
| auto Event = CGH.withAuxHandler(QueueCopy, [&](handler &InitHandler) { | ||
| auto Acc = Buf->get_access<access::mode::discard_write>(InitHandler); | ||
| InitHandler.single_task([=]() { Acc[0] = 0; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would work because unnamed-lambda functions mode is ON by default now (because the default value of it depends on sycl-std, which is set to 2020 by default now).
It will not work with -fno-sycl-unnamed-lambda, IMO.
| return {*CounterBuf, CGH}; | ||
| } | ||
|
|
||
| auto getGroupsCounterAccDiscrete(handler &CGH) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment section describing the func, params and return.
|
Alternative implementation: aelovikov-intel#1 |
No description provided.