-
Notifications
You must be signed in to change notification settings - Fork 803
[SYCL] Added ext_oneapi_non_uniform_groups aspect
#10902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
75fbe0f
Added asserts to non-supporting platforms
fe5ab18
Lint.
32961b5
Lint part 2.
d885fdd
Impl ext_oneapi_non_uniform_groups aspect.
06206f8
Lint.
a59229b
Adjust mention of root_groups in doc.
ee5e1c2
Fix comma.
a8e85aa
Added ext_oneapi_non_uniform_groups aspect to __TestAspectList.
9ea80ca
Place new aspect before "]"
fd2f62f
Marked hip unsupported in fixed_size_group.cpp
f75ab44
Added `Backend support status` section to doc.
79df2ab
Moved __uses_aspects__attribute before type.
558c8d2
Update sycl/doc/extensions/proposed/sycl_ext_oneapi_non_uniform_group…
JackAKirk b34be0e
Update sycl/doc/extensions/proposed/sycl_ext_oneapi_non_uniform_group…
JackAKirk 9c1f3a7
Format tangle_group attr.
9546f8d
Format.
be657c9
Moved aspect definition to appropriate place.
03ca377
Merge branch 'sycl' into non-uniform-hip-errors
JackAKirk 2be3514
Fix last merge commit.
07fda3d
Small aesthetic non functional change.
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a static assertion our best option here? I worry it may be a little to harsh on users that don't intend to run such kernels on other platforms.
@Pennycook - What are your thoughts here? My thoughts are that ideally if we want to restrict the platforms of these, we would want to have an aspect for it and mark related features as requiring it so we can report failure at runtime instead if the user tried to launch a kernel with these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @steffenlarsen. Furthermore, won't these asserts trigger any time the functions are compiled, regardless of whether they're actually used? I don't think we want to be in a position where we can't compile for a specific backend just because there's a usage of one of these non-uniform groups somewhere in a header.
I think something like an aspect, optional feature, or other device queries makes sense here. We didn't include anything like this in the initial proposal because it wasn't clear what should and shouldn't be allowed... For example, should a device be allowed to support only certain
fixed_size_groupsizes, or certain scopes? I think we need to get more implementation experience -- across more than just SPIR-V and NVPTX -- before we can properly answer those questions.I'm not opposed to adding some broad aspects/queries for whether the groups are supported at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we can add the aspect then tell programmers it is their responsibility to use the aspect as a check in code that is using non uniform groups?
RE implementations on other backends:
For HIP AMD I'm not sure that that any of these groups can be fully implemented on existing hardware. The issue is that amd cards don't have independent forward progress (as it is defined here https://developer.nvidia.com/blog/inside-volta/), and they don't support
__syncwarp(mask);HIP does support some subsets of these group features, (ballot,any, or taking a mask), such that I think it would be possible to implement basically all of
ballot_groupandfixed_size_group, minusbarrier. Although due to the lack of independent forward progress guarantees I guess could mean that even then certain code might not be portable in the sense it doesn't hang etc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly.
Intel GPUs don't support "independent forward progress" either, and the extension doesn't require it. Working around the lack of
__syncwarp(mask)probably requires some assumptions and/or compiler smarts... since we only need to be able to synchronize the work-items in the active branch, it might be sufficient to run the equivalent of an unmasked__syncwarp, or to do nothing at all. For SPIR-V targets we currently just issue a memory barrier, which seems to work (see here).I'm still not comfortable with this and I think we need more test-cases to prove definitively whether things are working as expected. But that's why the extension is experimental. 😄
I agree, but this is always going to be true. I don't think the presence of non-uniform groups makes this any worse than it is already. Any code that makes assumptions about forward progress will be non-portable, and that's why we're defining the sycl_ext_oneapi_forward_progress extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know for sure how amd devices would behave - we'd just have to experiment and see what happens. I guess that we don't want to support e.g. ballot_group partially in a backend if it can be helped. There is also the point that amd could start supporting these features in the not too distant future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking me whether all the APIs in "sycl_ext_oneapi_non_uniform_groups" should be "optional kernel features" that are tied to some aspect? Obviously, the extension is easier to use if we can guarantee that it is available for all devices. Do we think that it can be implemented for all devices, and we just haven't completed the implementation on certain backends? Or, do we think these APIs can never be implemented on certain devices?
If it's the former, it would be better not to burden application developers by requiring them to check an aspect. Triggering a static-assert might be reasonable as a short-term solution if we can add the missing support soon.
If it's the later, then we should change the specification to add the aspect and document the APIs as optional device features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, and I was hoping we could get more implementation experience before having to answer this question. But my guess is that there will be certain devices that cannot implement all of the functionality in the specification as it's written today (or that there might be certain compilers and/or library-only implementations that choose not to do so).
How hard is it for us to add new optional device features? I ask because it's still not clear to me if we would want one aspect (i.e. for non-uniform groups) or multiple aspects (e.g. for each type of non-uniform group, or each combination of non-uniform group and scope). I suppose what I'm asking is: If we started with one aspect today, how angry would people be if we need to change that later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this extension is still experimental, we can make API breaking changes in the future. For example, the extension could require all devices to support the APIs now, and we can still add an aspect later if necessary. If we decide that DPC++ can support these APIs on all devices, the extension can expose the APIs as "required device features". When it comes time to adopt them into SYCL-Next, we can still adopt them as "optional features" if other vendors think they will be hard to support.
That said, it is not too hard to add support for an optional feature / aspect. There are two main parts to the implementation:
The header file needs to add
[[__sycl_detail__::__uses_aspects__(aspect::foo)]]to either the declaration of a function or to the definition of a type as described in the design.We need some backend-specific code that can query a device and decide whether the device supports the aspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JackAKirk Given Greg's response above, would something like an
aspect::non_uniform_groupswork for you?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it sounds sensible. I can try adding
[[__sycl_detail__::__uses_aspects__(aspect::foo)]]to the header and add the aspect and then check the behavior.