Skip to content

Conversation

@aelovikov-intel
Copy link
Contributor

The extension itself is being implemented in
#7593

@aelovikov-intel aelovikov-intel requested a review from a team as a code owner April 14, 2023 17:15
@aelovikov-intel
Copy link
Contributor Author

@aelizaro , for some reason I can't add you to the reviewers, so tagging in a comment.

@aelovikov-intel aelovikov-intel temporarily deployed to aws April 14, 2023 17:39 — with GitHub Actions Inactive
@aelovikov-intel aelovikov-intel temporarily deployed to aws April 14, 2023 18:55 — with GitHub Actions Inactive
@aelovikov-intel aelovikov-intel temporarily deployed to aws April 14, 2023 20:23 — with GitHub Actions Inactive
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconsider naming after #9108 lands.

@aelovikov-intel aelovikov-intel temporarily deployed to aws April 18, 2023 19:14 — with GitHub Actions Inactive
@aelovikov-intel aelovikov-intel temporarily deployed to aws April 18, 2023 21:55 — with GitHub Actions Inactive
The extension itself is being implemented in
intel#7593
@aelovikov-intel aelovikov-intel temporarily deployed to aws April 21, 2023 00:03 — with GitHub Actions Inactive
@aelovikov-intel aelovikov-intel temporarily deployed to aws April 21, 2023 03:20 — with GitHub Actions Inactive

auto generic = [&]() {
group_barrier(g);
detail::loop_unroll_up_to<ElementsPerWorkItem, 16>([&](size_t i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 16 just a magic optimization number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of, although not totally arbitrary. SPIR-V/HW block-read/write functions support uchar/uchar2/.../uchar16 and int/int2/int4/int8 for int and other non-char types.


#ifdef __SYCL_DEVICE_ONLY__
template <typename Properties>
constexpr bool is_blocked(Properties properties) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are not public interfaces, I would prefer we stick to the LLVM style guide when naming them.

That also goes for some of the other related implementations in here. There are some places where the naming schemes are inconsistent.


// See std::enable_if_t above restricting this implementation.
using GroupTy = GroupHelper;
auto g = gh;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the copy here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm answering below, GroupHelper follows the spec naming. In this particular overload/specialization I know that it is in fact a real group/sub_group and not a helper with a scratchpad, so I'm clarifying this here so that the code below could be less ambiguous.

typename std::iterator_traits<InputIteratorT>::value_type>;

// See std::enable_if_t above restricting this implementation.
using GroupTy = GroupHelper;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we can't just name it Group (or GroupTy or GroupTy) in the template. I would argue GroupHelper is even more confusing as it could also suggest that the type is some intermediate helper type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm simply following the spec here. I think it makes sense to do that.

size /= 2;
return size;
}();
using HWBlockTy = detail::cl_unsigned<hw_block_size>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use OpenCL types here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that's what SPIR-V intrinsics need.

auto impl_sg = [&](sub_group sg, auto *in_ptr) {
value_type v[ElementsPerWorkItem];

auto priv_ptr = reinterpret_cast<char *>(&v);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters for this case, but to my knowledge the & isn't needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's still fine, from https://en.cppreference.com/w/cpp/language/array:

int a[3] = {1, 2, 3}, b[3] = {4, 5, 6};
int (*p)[3] = &a; // okay: address of a can be taken
int a[2];            // array of 2 int
int* p1 = a;         // a decays to a pointer to the first element of a

And I think "address of the array" is what we really need here, hence extra "&".

I won't insist on keeping it though, if you have some motivation other than it simply can be dropped.

// once it is implemented instead of this free function.
auto ndi =
sycl::ext::oneapi::experimental::this_nd_item<GroupTy::dimensions>();
auto sg = ndi.get_sub_group();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a this_sub_group free function. Could that be used here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I didn't know about that one, thanks!


constexpr bool assume_full_sg =
properties.template has_property<property::full_sg_key>();
// We'd need too much private memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this comment is trying to tell me something, but I'm not sure I understand what.


size_t block_idx = idx % hw_blocks_per_block;
size_t wi = (idx / hw_blocks_per_block) % sg_size;
size_t BlockIdx = i / hw_blocks_per_block; // uniform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; I'm not a big fan of block_idx and BlockIdx being different things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some incomplete renaming from the early development version, I'll look into it, thanks.

template <typename Group, typename InputIteratorT, typename OutputT, int N,
typename Properties = decltype(properties())>
std::enable_if_t<
std::is_convertible_v<remove_decoration_t<typename std::iterator_traits<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove_decoration_t remove cv-qualifiers? I noticed compilation issues when sycl::buffer with read_only access is used as an input

InputIteratorT>::value_type>,
OutputT> &&
is_group_helper_v<GroupHelper>>
group_load(GroupHelper gh, InputIteratorT in_ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should properties be default here as well?

Comment on lines +33 to +38
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @Pennycook and @aelizaro .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what does this mean? Is this to identify cases with remainder sub-groups, where the behavior of loading multiple elements might be surprising?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly that.


auto generic = [&]() {
group_barrier(g);
detail::loop_unroll_up_to<ElementsPerWorkItem, 16>([&](size_t i) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of, although not totally arbitrary. SPIR-V/HW block-read/write functions support uchar/uchar2/.../uchar16 and int/int2/int4/int8 for int and other non-char types.

typename std::iterator_traits<InputIteratorT>::value_type>;

// See std::enable_if_t above restricting this implementation.
using GroupTy = GroupHelper;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm simply following the spec here. I think it makes sense to do that.


// See std::enable_if_t above restricting this implementation.
using GroupTy = GroupHelper;
auto g = gh;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm answering below, GroupHelper follows the spec naming. In this particular overload/specialization I know that it is in fact a real group/sub_group and not a helper with a scratchpad, so I'm clarifying this here so that the code below could be less ambiguous.

size /= 2;
return size;
}();
using HWBlockTy = detail::cl_unsigned<hw_block_size>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that's what SPIR-V intrinsics need.

auto impl_sg = [&](sub_group sg, auto *in_ptr) {
value_type v[ElementsPerWorkItem];

auto priv_ptr = reinterpret_cast<char *>(&v);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's still fine, from https://en.cppreference.com/w/cpp/language/array:

int a[3] = {1, 2, 3}, b[3] = {4, 5, 6};
int (*p)[3] = &a; // okay: address of a can be taken
int a[2];            // array of 2 int
int* p1 = a;         // a decays to a pointer to the first element of a

And I think "address of the array" is what we really need here, hence extra "&".

I won't insist on keeping it though, if you have some motivation other than it simply can be dropped.

// once it is implemented instead of this free function.
auto ndi =
sycl::ext::oneapi::experimental::this_nd_item<GroupTy::dimensions>();
auto sg = ndi.get_sub_group();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I didn't know about that one, thanks!


size_t block_idx = idx % hw_blocks_per_block;
size_t wi = (idx / hw_blocks_per_block) % sg_size;
size_t BlockIdx = i / hw_blocks_per_block; // uniform
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some incomplete renaming from the early development version, I'll look into it, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants