-
Notifications
You must be signed in to change notification settings - Fork 16.8k
[IR] Remove pointer arguments from loop.dependence.{war|raw}.mask #188248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MacDue
wants to merge
8
commits into
llvm:main
Choose a base branch
from
MacDue:remove_pointer_args
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
d4fd16d
[IR] Remove pointer arguments from loop.dependance.{war|raw}.mask
MacDue 35d63a7
Fixups
MacDue 8794e84
Fixups
MacDue 321eccc
Fixups
MacDue 2165376
Rewrite docs
MacDue 856ae46
Fix typo
MacDue 2eaf933
More doc tweaks
MacDue 6ae5485
Fixups
MacDue File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been a while since I looked into these intrinsics but it sounds like this would remove the reason for them existing. The instructions on AArch64 do not follow the basic math that you would expect from simple integers, otherwise we would recognise them from icmp+activelanemask directly. As far as I remember the reason the intrinsics exist is to properly handle the fact that they do not wrap as expected from i64 math.
(Also I think it should be possible to fold a
loop.mask(A,B)withnoalias A, Bto all-true, which could be useful in the current pass pipeline but maybe isn't something that anyone would ever implement).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It comes from #114028 (review)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specify this in a way we don't need to be overly cautious about the extreme cases? I don't think they introduce a correctness issue (in that a lane is active where a dependence exists).
I think for there to be a correctness issue:
trueDist)trueDist.But as 1 to VL - 1 should fit within an i64, I don't think any correctness issue can occur.
Note: If this is not possible it also makes this intrinsic hard to expand without whilewr/whilerw (and I think the current expansions would be incorrect).
This comment was marked as outdated.
Sorry, something went wrong.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, maybe if we specify
%pointerDiffas implicitly frozen (to avoidfreeze(sub nsw) -> sub), and always emit the subs withnswwithin the loop vectorizer. We can fold towhilewr/rwand still give a safe result in case the difference does wrap. If we can't fold thesubinto the mask, we can always expand to awhilelo(but I don't think that should occur to masks emitted by the loop vectorizer). Any thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when it wraps and is poison, we want to always generate an all-true mask? It think if we implicitly freeze, we could end up with a pointerDiff value that makes a mask with some trailing lanes being false, which would still correct if I understand the above correctly.
Would the new variant keep the bit about
pointerDiffneeding to be a multiple of element size? If so, then the implicitly frozen value may not be a multiple and we would have poison mask. If that's an issue, maybe the intrinsic could return a all-true mask for poison pointerDiff?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was do to do something "safe" but not necessarily optimal for the extreme case.
I think I'm probably overthinking this though and it'd be simpler to do
@llvm.loop.dependence.war.mask.v4i1.i64(i64 %a, i64 %b, i64 immarg %elementSize), and more precisely specify the semantics. We'd still have the CSE issues, but not having to deal with pointer types is still a little nicer.I think the semantics should be (and I think this could be done in plain IR):
.war mask: true if any of:
icmp ule i64 %b, %aicmp ult (%elementSize * lane), (%b - %a).raw mask: true if any of:
icmp eq i64 %a, %bicmp ugt i64 %b, %a:icmp ult (%elementSize * lane), (%b - %a)icmp ult (%elementSize * lane), (%a - %b)Not sure if the current expansions exactly match this or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that also sounds reasonable, with the main benefit being to avoid the conversion back to pointers.
With that we could also simplify the wording quite a bit I think, not mention pointers at all, as it is defined clearly as IR operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can just use unsigned conditions and they are equivalent to the instructions that would be good. That would help make them simpler.
What makes ptr arguments difficult to deal with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using these instructions as an alternative way to handle diff checks. The SCEV expressions use for diff checks have already been converted to integers at the places we want to emit dependence masks. If we don't rework things to preserve the exact pointer types then naively casting to a pointer type will drop information such as the
addresspace. I don't think this is a problem for correctness, but the IR does look broken. Really, these instructions don't care about all the details about pointers other than the address, so I think it makes sense to drop that information.