Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ArgPromotion] Remove incorrect TranspBlocks set for loads. #84835

Merged
merged 2 commits into from
Mar 12, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Mar 11, 2024

The TranspBlocks set was used to cache aliasing decision for all processed loads in the parent loop. This is incorrect, because each load can access a different location, which means one load not being modified in a block doesn't translate to another load not being modified in the same block.

All loads access the same underlying object, so we could perhaps use a location without size for all loads and retain the cache, but that would mean we loose precision.

For now, just drop the cache.

Fixes #84807

The TranspBlocks set was used to cache aliasing decision for all
processed loads in the parent loop. This is incorrect, because each load
can access a different location, which means one load not being modified
in a block doesn't translate to another load not being modified in the
same block.

All loads access the same underlying object, so we could perhaps use a
location without size for all loads and retain the cache, but that would
mean we loose precision.

For now, just drop the cache.

Fixes llvm#84807
@fhahn fhahn requested review from nikic, alinas and aeubanks March 11, 2024 21:22
@fhahn fhahn changed the title [ArgPromotion] Drop incorrect TranspBlocks set for loads. [ArgPromotion] Remove incorrect TranspBlocks set for loads. Mar 11, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 11, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

The TranspBlocks set was used to cache aliasing decision for all processed loads in the parent loop. This is incorrect, because each load can access a different location, which means one load not being modified in a block doesn't translate to another load not being modified in the same block.

All loads access the same underlying object, so we could perhaps use a location without size for all loads and retain the cache, but that would mean we loose precision.

For now, just drop the cache.

Fixes #84807


Full diff: https://github.com/llvm/llvm-project/pull/84835.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/ArgumentPromotion.cpp (+1-5)
  • (modified) llvm/test/Transforms/ArgumentPromotion/aliasing-and-non-aliasing-loads-with-clobber.ll (+5-5)
diff --git a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
index e89ec353487eef..3aa8ea3f514713 100644
--- a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
+++ b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
@@ -653,10 +653,6 @@ static bool findArgParts(Argument *Arg, const DataLayout &DL, AAResults &AAR,
   // check to see if the pointer is guaranteed to not be modified from entry of
   // the function to each of the load instructions.
 
-  // Because there could be several/many load instructions, remember which
-  // blocks we know to be transparent to the load.
-  df_iterator_default_set<BasicBlock *, 16> TranspBlocks;
-
   for (LoadInst *Load : Loads) {
     // Check to see if the load is invalidated from the start of the block to
     // the load itself.
@@ -670,7 +666,7 @@ static bool findArgParts(Argument *Arg, const DataLayout &DL, AAResults &AAR,
     // To do this, we perform a depth first search on the inverse CFG from the
     // loading block.
     for (BasicBlock *P : predecessors(BB)) {
-      for (BasicBlock *TranspBB : inverse_depth_first_ext(P, TranspBlocks))
+      for (BasicBlock *TranspBB : inverse_depth_first(P))
         if (AAR.canBasicBlockModify(*TranspBB, Loc))
           return false;
     }
diff --git a/llvm/test/Transforms/ArgumentPromotion/aliasing-and-non-aliasing-loads-with-clobber.ll b/llvm/test/Transforms/ArgumentPromotion/aliasing-and-non-aliasing-loads-with-clobber.ll
index 69385a7ea51a74..1c771d550b2192 100644
--- a/llvm/test/Transforms/ArgumentPromotion/aliasing-and-non-aliasing-loads-with-clobber.ll
+++ b/llvm/test/Transforms/ArgumentPromotion/aliasing-and-non-aliasing-loads-with-clobber.ll
@@ -14,10 +14,7 @@ define i32 @caller1(i1 %c) {
 ; CHECK-LABEL: define i32 @caller1(
 ; CHECK-SAME: i1 [[C:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[F_VAL:%.*]] = load i16, ptr @f, align 8
-; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr i8, ptr @f, i64 8
-; CHECK-NEXT:    [[F_VAL1:%.*]] = load i64, ptr [[TMP0]], align 8
-; CHECK-NEXT:    call void @callee1(i16 [[F_VAL]], i64 [[F_VAL1]], i1 [[C]])
+; CHECK-NEXT:    call void @callee1(ptr noundef nonnull @f, i1 [[C]])
 ; CHECK-NEXT:    ret i32 0
 ;
 entry:
@@ -27,13 +24,16 @@ entry:
 
 define internal void @callee1(ptr nocapture noundef readonly %q, i1 %c) {
 ; CHECK-LABEL: define internal void @callee1(
-; CHECK-SAME: i16 [[Q_0_VAL:%.*]], i64 [[Q_8_VAL:%.*]], i1 [[C:%.*]]) {
+; CHECK-SAME: ptr nocapture noundef readonly [[Q:%.*]], i1 [[C:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 [[C]], label [[THEN:%.*]], label [[EXIT:%.*]]
 ; CHECK:       then:
 ; CHECK-NEXT:    store i16 123, ptr @f, align 8
 ; CHECK-NEXT:    br label [[EXIT]]
 ; CHECK:       exit:
+; CHECK-NEXT:    [[Q_0_VAL:%.*]] = load i16, ptr [[Q]], align 8
+; CHECK-NEXT:    [[GEP_8:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 8
+; CHECK-NEXT:    [[Q_8_VAL:%.*]] = load i64, ptr [[GEP_8]], align 8
 ; CHECK-NEXT:    call void @use(i16 [[Q_0_VAL]], i64 [[Q_8_VAL]])
 ; CHECK-NEXT:    ret void
 ;

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I don't think this code is particularly hot, but it would probably make sense to move it to use BatchAA at least.

@efriedma-quic
Copy link
Collaborator

Maybe it's worth looking into fixing the API of inverse_depth_first_ext/etc. to be less error-prone? For example, maybe we can make it automatically clear the storage when it's constructed, or something like that.

@fhahn fhahn merged commit bba4a1d into llvm:main Mar 12, 2024
3 of 4 checks passed
@fhahn fhahn deleted the fix-argpromotion branch March 12, 2024 09:47
@fhahn
Copy link
Contributor Author

fhahn commented Mar 12, 2024

I don't think this code is particularly hot, but it would probably make sense to move it to use BatchAA at least.

Sounds good, we would need to add the functions used by ArgumentPromotion to BatchAA.

Maybe it's worth looking into fixing the API of inverse_depth_first_ext/etc. to be less error-prone? For example, maybe we can make it automatically clear the storage when it's constructed, or something like that.

I am not sure, I had a look at the other sues of inverse_depth_first_ext and they all seem to intentionally use it to extend the visited set across multiple invocations of inverse_depth_first_ext. If a fresh set should be used on each invocation, inverse_depth_first can be used instead. Put up #84920 to update the single instance I think is applicable.

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Mar 12, 2024
The TranspBlocks set was used to cache aliasing decision for all
processed loads in the parent loop. This is incorrect, because each load
can access a different location, which means one load not being modified
in a block doesn't translate to another load not being modified in the
same block.

All loads access the same underlying object, so we could perhaps use a
location without size for all loads and retain the cache, but that would
mean we loose precision.

For now, just drop the cache.

Fixes llvm#84807

PR: llvm#84835
(cherry picked from commit bba4a1d)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Mar 13, 2024
The TranspBlocks set was used to cache aliasing decision for all
processed loads in the parent loop. This is incorrect, because each load
can access a different location, which means one load not being modified
in a block doesn't translate to another load not being modified in the
same block.

All loads access the same underlying object, so we could perhaps use a
location without size for all loads and retain the cache, but that would
mean we loose precision.

For now, just drop the cache.

Fixes llvm#84807

PR: llvm#84835
(cherry picked from commit bba4a1d)
@pointhex pointhex mentioned this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ArgPromotion] Miscompile at -O3
4 participants