Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPlan] Replace disjoint or with add instead of dropping disjoint. #83821

Merged
merged 4 commits into from
Mar 19, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Mar 4, 2024

Dropping disjoint from an OR may yield incorrect results, as some analysis may have converted it to an Add implicitly (e.g. SCEV used for dependence analysis). Instead, replace it with an equivalent Add.

This is possible as all users of the disjoint OR only access lanes where the operands are disjoint or poison otherwise.

Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked.

There are other places that may drop poison-generating flags; those likely need similar treatment.

Fixes #81872

Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used
for dependence analysis). Instead, replace it with an equivalent Add.

This is possible as all users of the disjoint OR only access lanes
where the operands are disjoint or poison otherwise.

Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs
that SCEV treated as ADDs, but those are not tracked.
@llvmbot
Copy link
Member

llvmbot commented Mar 4, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Dropping disjoint from an OR may yield incorrect results, as some analysis may have converted it to an Add implicitly (e.g. SCEV used for dependence analysis). Instead, replace it with an equivalent Add.

This is possible as all users of the disjoint OR only access lanes where the operands are disjoint or poison otherwise.

Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked.

There are other places that may drop poison-generating flags; those likely need similar treatment.


Full diff: https://github.com/llvm/llvm-project/pull/83821.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h (+3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+8)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h (+23-4)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+18)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index a7ebf78e54ceb6..b94859864fff3c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -68,6 +68,9 @@ class VPBuilder {
 public:
   VPBuilder() = default;
   VPBuilder(VPBasicBlock *InsertBB) { setInsertPoint(InsertBB); }
+  VPBuilder(VPRecipeBase *InsertPt) {
+    setInsertPoint(InsertPt->getParent(), InsertPt->getIterator());
+  }
 
   /// Clear the insertion point: created instructions will not be inserted into
   /// a block.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 16c09a83e777dd..b565b4351e16d4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1127,6 +1127,12 @@ class VPRecipeWithIRFlags : public VPSingleDefRecipe {
     return WrapFlags.HasNSW;
   }
 
+  bool isDisjoint() const {
+    assert(OpType == OperationType::DisjointOp &&
+           "recipe cannot have a disjoing flag");
+    return DisjointFlags.IsDisjoint;
+  }
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   void printFlags(raw_ostream &O) const;
 #endif
@@ -2136,6 +2142,8 @@ class VPReplicateRecipe : public VPRecipeWithIRFlags {
     assert(isPredicated() && "Trying to get the mask of a unpredicated recipe");
     return getOperand(getNumOperands() - 1);
   }
+
+  unsigned getOpcode() const { return getUnderlyingInstr()->getOpcode(); }
 };
 
 /// A recipe for generating conditional branches on the bits of a mask.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index b90c588b607564..4b5b6b8cc3dbcf 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -73,12 +73,12 @@ template <typename Op0_t, unsigned Opcode> struct UnaryVPInstruction_match {
   }
 };
 
-template <typename Op0_t, typename Op1_t, unsigned Opcode>
-struct BinaryVPInstruction_match {
+template <typename RecipeTy, typename Op0_t, typename Op1_t, unsigned Opcode>
+struct BinaryRecipe_match {
   Op0_t Op0;
   Op1_t Op1;
 
-  BinaryVPInstruction_match(Op0_t Op0, Op1_t Op1) : Op0(Op0), Op1(Op1) {}
+  BinaryRecipe_match(Op0_t Op0, Op1_t Op1) : Op0(Op0), Op1(Op1) {}
 
   bool match(const VPValue *V) {
     auto *DefR = V->getDefiningRecipe();
@@ -86,15 +86,27 @@ struct BinaryVPInstruction_match {
   }
 
   bool match(const VPRecipeBase *R) {
-    auto *DefR = dyn_cast<VPInstruction>(R);
+    auto *DefR = dyn_cast<RecipeTy>(R);
     if (!DefR || DefR->getOpcode() != Opcode)
       return false;
     assert(DefR->getNumOperands() == 2 &&
            "recipe with matched opcode does not have 2 operands");
     return Op0.match(DefR->getOperand(0)) && Op1.match(DefR->getOperand(1));
   }
+
+  bool match(const VPSingleDefRecipe *R) {
+    return match(static_cast<const VPRecipeBase *>(R));
+  }
 };
 
+template <typename Op0_t, typename Op1_t, unsigned Opcode>
+using BinaryVPInstruction_match =
+    BinaryRecipe_match<VPInstruction, Op0_t, Op1_t, Opcode>;
+
+template <typename Op0_t, typename Op1_t, unsigned Opcode>
+using BinaryVPReplicate_match =
+    BinaryRecipe_match<VPReplicateRecipe, Op0_t, Op1_t, Opcode>;
+
 template <unsigned Opcode, typename Op0_t>
 inline UnaryVPInstruction_match<Op0_t, Opcode>
 m_VPInstruction(const Op0_t &Op0) {
@@ -130,6 +142,13 @@ inline BinaryVPInstruction_match<Op0_t, Op1_t, VPInstruction::BranchOnCount>
 m_BranchOnCount(const Op0_t &Op0, const Op1_t &Op1) {
   return m_VPInstruction<VPInstruction::BranchOnCount>(Op0, Op1);
 }
+
+template <unsigned Opcode, typename Op0_t, typename Op1_t>
+inline BinaryVPReplicate_match<Op0_t, Op1_t, Opcode>
+m_VPReplicate(const Op0_t &Op0, const Op1_t &Op1) {
+  return BinaryVPReplicate_match<Op0_t, Op1_t, Opcode>(Op0, Op1);
+}
+
 } // namespace VPlanPatternMatch
 } // namespace llvm
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 9d6deb802e2090..818647d5ea1bac 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1249,6 +1249,24 @@ void VPlanTransforms::dropPoisonGeneratingRecipes(
       // load/store. If the underlying instruction has poison-generating flags,
       // drop them directly.
       if (auto *RecWithFlags = dyn_cast<VPRecipeWithIRFlags>(CurRec)) {
+        VPValue *A, *B;
+        using namespace llvm::VPlanPatternMatch;
+        // Dropping disjoint from an OR may yield incorrect results, as some
+        // analysis may have converted it to an Add implicitly (e.g. SCEV used
+        // for dependence analysis). Instead, replace it with an equivalent Add.
+        // This is possible as all users of the disjoint OR only access lanes
+        // where the operands are disjoint or poison otherwise.
+        if (match(RecWithFlags,
+                  m_VPReplicate<Instruction::Or>(m_VPValue(A), m_VPValue(B))) &&
+            RecWithFlags->isDisjoint()) {
+          VPBuilder Builder(RecWithFlags);
+          VPInstruction *New = Builder.createOverflowingOp(
+              Instruction::Add, {A, B}, {false, false},
+              RecWithFlags->getDebugLoc());
+          RecWithFlags->replaceAllUsesWith(New);
+          RecWithFlags->eraseFromParent();
+          CurRec = New;
+        }
         RecWithFlags->dropPoisonGeneratingFlags();
       } else {
         Instruction *Instr = dyn_cast_or_null<Instruction>(

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable conceptually, but I'm not the right person to review VPlan changes.

Test case seems to be missing?

@annamthomas
Copy link
Contributor

Thank you for the fix! I'll land my test case from the other MR and close it off in favour of this one.

@annamthomas
Copy link
Contributor

I've landed the test and the post commit comments. Pls rebase. Thanks.

@fhahn
Copy link
Contributor Author

fhahn commented Mar 18, 2024

@annamthomas thanks for landing the fix! Rebased, also the majority of the pattern-match changes have been split up and submitted & used separately.

Copy link

github-actions bot commented Mar 18, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@annamthomas
Copy link
Contributor

Change LGTM.

Could you pls link the commit to #81872 ? Thanks!

@fhahn fhahn merged commit c2c1e6e into llvm:main Mar 19, 2024
4 checks passed
@fhahn fhahn deleted the vplan-replace-disjoint-or-with-add branch March 19, 2024 19:16
d0k added a commit that referenced this pull request Mar 20, 2024
…oint. (#83821)"

This reverts commit c2c1e6e. It creates
a use after free.

==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08
READ of size 1 at 0x50f000001760 thread T0
 #0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13
 #1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>)::$_0::operator()(llvm::VPRecipeBase*) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23
 #2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Can be reproduced with asan on
Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
Transforms/LoopVectorize/X86/pr81872.ll
Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
…lvm#83821)

Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.

This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.

Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.

There are other places that may drop poison-generating flags; those
likely need similar treatment.

Fixes llvm#81872


PR: llvm#83821
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
…oint. (llvm#83821)"

This reverts commit c2c1e6e. It creates
a use after free.

==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08
READ of size 1 at 0x50f000001760 thread T0
 #0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13
 #1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>)::$_0::operator()(llvm::VPRecipeBase*) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23
 #2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Can be reproduced with asan on
Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
Transforms/LoopVectorize/X86/pr81872.ll
Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
RecWithFlags->replaceAllUsesWith(New);
RecWithFlags->eraseFromParent();
CurRec = New;
}
RecWithFlags->dropPoisonGeneratingFlags();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in the else branch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be one of the two:

  • in else branch (to get rid of the use-after-free error since RecWithFlags is removed within the if branch above), OR
  • CurRec->dropPoisonGeneratingFlags

Makes sense to just have it in the else branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhahn the change was reverted because of the use-after-free above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that this was reverted, recommitted with a fix.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some post-commit (and post-revert) thoughts.

// analysis may have converted it to an Add implicitly (e.g. SCEV used
// for dependence analysis). Instead, replace it with an equivalent Add.
// This is possible as all users of the disjoint OR only access lanes
// where the operands are disjoint or poison otherwise.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be taken care of by VPRecipeWithIRFlags::dropPoisonGeneratingFlags(), i.e., have it replace the opcode of a disjoint Or with Add?

nit (independent): collectPoisonGeneratingInstrsInBackwardSlice() does more than collect Instrs, it also updates them.

Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked.

Should code based on such SCEV expressions be generated (expanded(?)) directly?

(This case is reminiscent of dropping assumptions based on conditions subject to predication.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be taken care of by VPRecipeWithIRFlags::dropPoisonGeneratingFlags(), i.e., have it replace the opcode of a disjoint Or with Add?

Hmmm, it may work if there's no need to directly delete the old recipe, i.e. it will get cleaned up. Let me check.

Should code based on such SCEV expressions be generated (expanded(?)) directly?

(This case is reminiscent of dropping assumptions based on conditions subject to predication.)

At the moment, SCEV is only used for analysis of existing IR instructions in this case; switching use SCEV expansion for pointer expressions would probably introduce some other potential issues.

fhahn added a commit that referenced this pull request Mar 27, 2024
…sjoint. (#83821)"

Recommit with a fix for the use-after-free causing the revert.
This reverts the revert commit f872043.

Original commit message:

Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.

This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.

Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.

There are other places that may drop poison-generating flags; those
likely need similar treatment.

Fixes #81872

PR: #83821
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Miscompile with LoopVectorizer: Incorrect code with disjoint or
6 participants