-
Notifications
You must be signed in to change notification settings - Fork 800
[SYCL][Doc] Simplify non-uniform group design #14604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] Simplify non-uniform group design #14604
Conversation
This commit simplifies the design of the non-uniform group extension as follows: - The _group suffix is removed from user-constructed groups to reduce verbosity. - get_*_group() functions are replaced with partitioning functions, to clarify that users are not getting an existing group but rather creating a new one. - Fixed-size groups are renamed to "chunks". The previous name did not convey that the group is contiguous as well as having a fixed size. - Ballot-groups and opportunistic-groups have been combined into a single group type called a "fragment", focusing on what the groups represent rather than how they were constructed. - Synchronization behavior is added to all partitioning functions, to reduce the cognitive burden (of remembering which partitioning functions synchronize). - An implicit cast from chunks to fragments is introduced. A fragment is a more general representation than a chunk, and so this conversion is always valid. Signed-off-by: John Pennycook <[email protected]>
The implementation requirements for tangle are different to other non-uniform groups, which may limit their availability to certain device types. Splitting them out into a separate extension will make it easier to set user extensions and to track implementation status across backends. Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
AlexeySachkov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at the tangle doc yet
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
| template <typename ParentGroup> | ||
| class tangle_group { | ||
| template <std::size_t ChunkSize, typename ParentGroup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably update linear_id_type definition here, see the discussion about recorded inconsistency in #13702. Linear id type is smaller than chunk size, meaning that in certain cases we won't be able to enumerate all work-items in a chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a few possible fixes here and I'm not sure what is best:
-
linear_id_type = std::size_t
Always matches, no problems. -
linear_id_type = ParentGroup::linear_id_type
Setsuint32_tforsub_groupandsize_tforgroup. The situation where somebody provides aChunkSizelarger than theParentGroupwould prevent enumeration of all work-items in the chunk, but that situation is already UB. -
linear_id_type = /* smallest unsigned integer with log2(ChunkSize) bits */
Could usestd::uint8_tforChunkSize< 256. Might lead to future ABI issues if we get something like astd::uint4_tin future (but that seems unlikely, given the number of bits in a byte).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards (2) here, because it makes chunk consistent with its parent group. I.e. if someone had a code for sub-groups, but then introduced some specialization for chunks, types of all IDs they may have in use would still match what APIs return. And this will also be true when we extend chunks to support group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. I made the change in 0f2a471.
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
AlexeySachkov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tangle part review
sycl/doc/extensions/experimental/sycl_ext_oneapi_tangle.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_tangle.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_tangle.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_tangle.asciidoc
Outdated
Show resolved
Hide resolved
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
AlexeySachkov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my feedback has been addressed, I don't have any further comments. That issue about liner_id_type of chunk is not a high-priority one to be resolved immediately, because we don't have a device which supports sub-group sizes bigger than uint32_t and potential issue I've highlighted there can't really be encountered right now.
Signed-off-by: John Pennycook <[email protected]>
|
@intel/dpcpp-specification-reviewers, any further comments before I merge this PR? |
I treat lack of response as a no, so merging the PR |
This PR implements the non-uniform groups API redesign proposed in [PR 14604](#14604) 1. Removed redundant `_group` suffix from class names 2. Used more descriptive function names that clearly indicate the partitioning behaviour 3. United similar functionality to reduce cognitive overhead * fixed_size_group → chunk * tangle_group → tangle * ballot_group → fragment * opportunistic_group → merged into fragment * get_ballot_group() → binary_partition() * get_fixed_size_group<N>() → chunked_partition<N>() * get_tangle_group() → entangle() --------- Signed-off-by: Larsen, Steffen <[email protected]> Co-authored-by: Andrei Zibrov <[email protected]> Co-authored-by: John Pennycook <[email protected]> Co-authored-by: Udit Kumar Agarwal <[email protected]>
This commit simplifies the design of the non-uniform group extension as follows:
The _group suffix is removed from user-constructed groups to reduce verbosity.
get_*_group() functions are replaced with partitioning functions, to clarify that users are not getting an existing group but rather creating a new one.
Fixed-size groups are renamed to "chunks". The previous name did not convey that the group is contiguous as well as having a fixed size.
Ballot-groups and opportunistic-groups have been combined into a single group type called a "fragment", focusing on what the groups represent rather than how they were constructed.
Synchronization behavior is added to all partitioning functions, to reduce the cognitive burden (of remembering which partitioning functions synchronize).
An implicit cast from chunks to fragments is introduced. A fragment is a more general representation than a chunk, and so this conversion is always valid.
Split tangle into its own extension. The implementation requirements for tangle are different to other non-uniform groups, which may limit their availability to certain device types. Splitting them out into a separate extension will make it easier to set user expectations and to track implementation status across backends.