-
Notifications
You must be signed in to change notification settings - Fork 810
[SYCL] Implement sycl_ext_oneapi_group_load_store #9074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@aelizaro , for some reason I can't add you to the reviewers, so tagging in a comment. |
b191350 to
4bb8910
Compare
4bb8910 to
004b019
Compare
sycl/include/sycl/detail/helpers.hpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconsider naming after #9108 lands.
The extension itself is being implemented in intel#7593
a0bec8f to
3a06c40
Compare
|
|
||
| auto generic = [&]() { | ||
| group_barrier(g); | ||
| detail::loop_unroll_up_to<ElementsPerWorkItem, 16>([&](size_t i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 16 just a magic optimization number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of, although not totally arbitrary. SPIR-V/HW block-read/write functions support uchar/uchar2/.../uchar16 and int/int2/int4/int8 for int and other non-char types.
|
|
||
| #ifdef __SYCL_DEVICE_ONLY__ | ||
| template <typename Properties> | ||
| constexpr bool is_blocked(Properties properties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are not public interfaces, I would prefer we stick to the LLVM style guide when naming them.
That also goes for some of the other related implementations in here. There are some places where the naming schemes are inconsistent.
|
|
||
| // See std::enable_if_t above restricting this implementation. | ||
| using GroupTy = GroupHelper; | ||
| auto g = gh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the copy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I'm answering below, GroupHelper follows the spec naming. In this particular overload/specialization I know that it is in fact a real group/sub_group and not a helper with a scratchpad, so I'm clarifying this here so that the code below could be less ambiguous.
| typename std::iterator_traits<InputIteratorT>::value_type>; | ||
|
|
||
| // See std::enable_if_t above restricting this implementation. | ||
| using GroupTy = GroupHelper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we can't just name it Group (or GroupTy or GroupTy) in the template. I would argue GroupHelper is even more confusing as it could also suggest that the type is some intermediate helper type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm simply following the spec here. I think it makes sense to do that.
| size /= 2; | ||
| return size; | ||
| }(); | ||
| using HWBlockTy = detail::cl_unsigned<hw_block_size>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use OpenCL types here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that's what SPIR-V intrinsics need.
| auto impl_sg = [&](sub_group sg, auto *in_ptr) { | ||
| value_type v[ElementsPerWorkItem]; | ||
|
|
||
| auto priv_ptr = reinterpret_cast<char *>(&v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters for this case, but to my knowledge the & isn't needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it's still fine, from https://en.cppreference.com/w/cpp/language/array:
int a[3] = {1, 2, 3}, b[3] = {4, 5, 6};
int (*p)[3] = &a; // okay: address of a can be taken
int a[2]; // array of 2 int
int* p1 = a; // a decays to a pointer to the first element of a
And I think "address of the array" is what we really need here, hence extra "&".
I won't insist on keeping it though, if you have some motivation other than it simply can be dropped.
| // once it is implemented instead of this free function. | ||
| auto ndi = | ||
| sycl::ext::oneapi::experimental::this_nd_item<GroupTy::dimensions>(); | ||
| auto sg = ndi.get_sub_group(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a this_sub_group free function. Could that be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I didn't know about that one, thanks!
|
|
||
| constexpr bool assume_full_sg = | ||
| properties.template has_property<property::full_sg_key>(); | ||
| // We'd need too much private memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this comment is trying to tell me something, but I'm not sure I understand what.
|
|
||
| size_t block_idx = idx % hw_blocks_per_block; | ||
| size_t wi = (idx / hw_blocks_per_block) % sg_size; | ||
| size_t BlockIdx = i / hw_blocks_per_block; // uniform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; I'm not a big fan of block_idx and BlockIdx being different things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some incomplete renaming from the early development version, I'll look into it, thanks.
| template <typename Group, typename InputIteratorT, typename OutputT, int N, | ||
| typename Properties = decltype(properties())> | ||
| std::enable_if_t< | ||
| std::is_convertible_v<remove_decoration_t<typename std::iterator_traits< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove_decoration_t remove cv-qualifiers? I noticed compilation issues when sycl::buffer with read_only access is used as an input
| InputIteratorT>::value_type>, | ||
| OutputT> && | ||
| is_group_helper_v<GroupHelper>> | ||
| group_load(GroupHelper gh, InputIteratorT in_ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should properties be default here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging @Pennycook and @aelizaro .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what does this mean? Is this to identify cases with remainder sub-groups, where the behavior of loading multiple elements might be surprising?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly that.
|
|
||
| auto generic = [&]() { | ||
| group_barrier(g); | ||
| detail::loop_unroll_up_to<ElementsPerWorkItem, 16>([&](size_t i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of, although not totally arbitrary. SPIR-V/HW block-read/write functions support uchar/uchar2/.../uchar16 and int/int2/int4/int8 for int and other non-char types.
| typename std::iterator_traits<InputIteratorT>::value_type>; | ||
|
|
||
| // See std::enable_if_t above restricting this implementation. | ||
| using GroupTy = GroupHelper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm simply following the spec here. I think it makes sense to do that.
|
|
||
| // See std::enable_if_t above restricting this implementation. | ||
| using GroupTy = GroupHelper; | ||
| auto g = gh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I'm answering below, GroupHelper follows the spec naming. In this particular overload/specialization I know that it is in fact a real group/sub_group and not a helper with a scratchpad, so I'm clarifying this here so that the code below could be less ambiguous.
| size /= 2; | ||
| return size; | ||
| }(); | ||
| using HWBlockTy = detail::cl_unsigned<hw_block_size>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that's what SPIR-V intrinsics need.
| auto impl_sg = [&](sub_group sg, auto *in_ptr) { | ||
| value_type v[ElementsPerWorkItem]; | ||
|
|
||
| auto priv_ptr = reinterpret_cast<char *>(&v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it's still fine, from https://en.cppreference.com/w/cpp/language/array:
int a[3] = {1, 2, 3}, b[3] = {4, 5, 6};
int (*p)[3] = &a; // okay: address of a can be taken
int a[2]; // array of 2 int
int* p1 = a; // a decays to a pointer to the first element of a
And I think "address of the array" is what we really need here, hence extra "&".
I won't insist on keeping it though, if you have some motivation other than it simply can be dropped.
| // once it is implemented instead of this free function. | ||
| auto ndi = | ||
| sycl::ext::oneapi::experimental::this_nd_item<GroupTy::dimensions>(); | ||
| auto sg = ndi.get_sub_group(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I didn't know about that one, thanks!
|
|
||
| size_t block_idx = idx % hw_blocks_per_block; | ||
| size_t wi = (idx / hw_blocks_per_block) % sg_size; | ||
| size_t BlockIdx = i / hw_blocks_per_block; // uniform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some incomplete renaming from the early development version, I'll look into it, thanks.
The extension itself is being implemented in
#7593