Skip to content

[AArch64] Tweak fixed-length loop.dependence.mask costs#175538

Merged
MacDue merged 3 commits intollvm:mainfrom
MacDue:tweak_loop_dep_cost
Feb 9, 2026
Merged

[AArch64] Tweak fixed-length loop.dependence.mask costs#175538
MacDue merged 3 commits intollvm:mainfrom
MacDue:tweak_loop_dep_cost

Conversation

@MacDue
Copy link
Copy Markdown
Member

@MacDue MacDue commented Jan 12, 2026

For fixed-length masks we need to AND the result of the whilewr/rw with ptrue vl* (which is at least one more instruction).

@llvmbot llvmbot added backend:AArch64 llvm:analysis Includes value tracking, cost tables and constant folding labels Jan 12, 2026
@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Jan 12, 2026

@llvm/pr-subscribers-backend-aarch64

Author: Benjamin Maxwell (MacDue)

Changes

It's not free (MOV + XTN) to convert from the predicate result of whilewr/rw to a fixed-length mask, so the cost should be slightly higher.


Full diff: https://github.com/llvm/llvm-project/pull/175538.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+9-3)
  • (modified) llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll (+8-8)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 85be8db9d3ae2..59f782b986a4f 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -1071,9 +1071,15 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
       EVT VecVT = getTLI()->getValueType(DL, RetTy);
       unsigned EltSizeInBytes =
           cast<ConstantInt>(ICA.getArgs()[2])->getZExtValue();
-      if (is_contained({1u, 2u, 4u, 8u}, EltSizeInBytes) &&
-          VecVT.getVectorMinNumElements() == (16 / EltSizeInBytes))
-        return 1;
+      if (!is_contained({1u, 2u, 4u, 8u}, EltSizeInBytes) ||
+          VecVT.getVectorMinNumElements() != (16 / EltSizeInBytes))
+        break;
+      InstructionCost Cost = 1;
+      // For fixed-vector types at least a MOV and XTN are needed to convert
+      // from the predicate to a fixed-length mask.
+      if (isa<FixedVectorType>(RetTy))
+        Cost += 2;
+      return Cost;
     }
     break;
   }
diff --git a/llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll b/llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll
index 5b3070fcf347e..7acd776a91b3c 100644
--- a/llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll
@@ -17,10 +17,10 @@ define void @loop_dependence_war_mask(ptr %a, ptr %b) {
 ; CHECK-EXPANDED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; CHECK-LABEL: 'loop_dependence_war_mask'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res1 = call <16 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 1)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res2 = call <8 x i1> @llvm.loop.dependence.war.mask.v8i1(ptr %a, ptr %b, i64 2)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res3 = call <4 x i1> @llvm.loop.dependence.war.mask.v4i1(ptr %a, ptr %b, i64 4)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res4 = call <2 x i1> @llvm.loop.dependence.war.mask.v2i1(ptr %a, ptr %b, i64 8)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res1 = call <16 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 1)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res2 = call <8 x i1> @llvm.loop.dependence.war.mask.v8i1(ptr %a, ptr %b, i64 2)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res3 = call <4 x i1> @llvm.loop.dependence.war.mask.v4i1(ptr %a, ptr %b, i64 4)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res4 = call <2 x i1> @llvm.loop.dependence.war.mask.v2i1(ptr %a, ptr %b, i64 8)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res5 = call <vscale x 16 x i1> @llvm.loop.dependence.war.mask.nxv16i1(ptr %a, ptr %b, i64 1)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res6 = call <vscale x 8 x i1> @llvm.loop.dependence.war.mask.nxv8i1(ptr %a, ptr %b, i64 2)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res7 = call <vscale x 4 x i1> @llvm.loop.dependence.war.mask.nxv4i1(ptr %a, ptr %b, i64 4)
@@ -54,10 +54,10 @@ define void @loop_dependence_raw_mask(ptr %a, ptr %b) {
 ; CHECK-EXPANDED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; CHECK-LABEL: 'loop_dependence_raw_mask'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res1 = call <16 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %a, ptr %b, i64 1)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res2 = call <8 x i1> @llvm.loop.dependence.raw.mask.v8i1(ptr %a, ptr %b, i64 2)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res3 = call <4 x i1> @llvm.loop.dependence.raw.mask.v4i1(ptr %a, ptr %b, i64 4)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res4 = call <2 x i1> @llvm.loop.dependence.raw.mask.v2i1(ptr %a, ptr %b, i64 8)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res1 = call <16 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %a, ptr %b, i64 1)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res2 = call <8 x i1> @llvm.loop.dependence.raw.mask.v8i1(ptr %a, ptr %b, i64 2)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res3 = call <4 x i1> @llvm.loop.dependence.raw.mask.v4i1(ptr %a, ptr %b, i64 4)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res4 = call <2 x i1> @llvm.loop.dependence.raw.mask.v2i1(ptr %a, ptr %b, i64 8)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res5 = call <vscale x 16 x i1> @llvm.loop.dependence.raw.mask.nxv16i1(ptr %a, ptr %b, i64 1)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res6 = call <vscale x 8 x i1> @llvm.loop.dependence.raw.mask.nxv8i1(ptr %a, ptr %b, i64 2)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res7 = call <vscale x 4 x i1> @llvm.loop.dependence.raw.mask.nxv4i1(ptr %a, ptr %b, i64 4)

@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Jan 12, 2026

@llvm/pr-subscribers-llvm-analysis

Author: Benjamin Maxwell (MacDue)

Changes

It's not free (MOV + XTN) to convert from the predicate result of whilewr/rw to a fixed-length mask, so the cost should be slightly higher.


Full diff: https://github.com/llvm/llvm-project/pull/175538.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+9-3)
  • (modified) llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll (+8-8)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 85be8db9d3ae2..59f782b986a4f 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -1071,9 +1071,15 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
       EVT VecVT = getTLI()->getValueType(DL, RetTy);
       unsigned EltSizeInBytes =
           cast<ConstantInt>(ICA.getArgs()[2])->getZExtValue();
-      if (is_contained({1u, 2u, 4u, 8u}, EltSizeInBytes) &&
-          VecVT.getVectorMinNumElements() == (16 / EltSizeInBytes))
-        return 1;
+      if (!is_contained({1u, 2u, 4u, 8u}, EltSizeInBytes) ||
+          VecVT.getVectorMinNumElements() != (16 / EltSizeInBytes))
+        break;
+      InstructionCost Cost = 1;
+      // For fixed-vector types at least a MOV and XTN are needed to convert
+      // from the predicate to a fixed-length mask.
+      if (isa<FixedVectorType>(RetTy))
+        Cost += 2;
+      return Cost;
     }
     break;
   }
diff --git a/llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll b/llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll
index 5b3070fcf347e..7acd776a91b3c 100644
--- a/llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll
@@ -17,10 +17,10 @@ define void @loop_dependence_war_mask(ptr %a, ptr %b) {
 ; CHECK-EXPANDED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; CHECK-LABEL: 'loop_dependence_war_mask'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res1 = call <16 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 1)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res2 = call <8 x i1> @llvm.loop.dependence.war.mask.v8i1(ptr %a, ptr %b, i64 2)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res3 = call <4 x i1> @llvm.loop.dependence.war.mask.v4i1(ptr %a, ptr %b, i64 4)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res4 = call <2 x i1> @llvm.loop.dependence.war.mask.v2i1(ptr %a, ptr %b, i64 8)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res1 = call <16 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 1)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res2 = call <8 x i1> @llvm.loop.dependence.war.mask.v8i1(ptr %a, ptr %b, i64 2)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res3 = call <4 x i1> @llvm.loop.dependence.war.mask.v4i1(ptr %a, ptr %b, i64 4)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res4 = call <2 x i1> @llvm.loop.dependence.war.mask.v2i1(ptr %a, ptr %b, i64 8)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res5 = call <vscale x 16 x i1> @llvm.loop.dependence.war.mask.nxv16i1(ptr %a, ptr %b, i64 1)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res6 = call <vscale x 8 x i1> @llvm.loop.dependence.war.mask.nxv8i1(ptr %a, ptr %b, i64 2)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res7 = call <vscale x 4 x i1> @llvm.loop.dependence.war.mask.nxv4i1(ptr %a, ptr %b, i64 4)
@@ -54,10 +54,10 @@ define void @loop_dependence_raw_mask(ptr %a, ptr %b) {
 ; CHECK-EXPANDED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; CHECK-LABEL: 'loop_dependence_raw_mask'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res1 = call <16 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %a, ptr %b, i64 1)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res2 = call <8 x i1> @llvm.loop.dependence.raw.mask.v8i1(ptr %a, ptr %b, i64 2)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res3 = call <4 x i1> @llvm.loop.dependence.raw.mask.v4i1(ptr %a, ptr %b, i64 4)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res4 = call <2 x i1> @llvm.loop.dependence.raw.mask.v2i1(ptr %a, ptr %b, i64 8)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res1 = call <16 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %a, ptr %b, i64 1)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res2 = call <8 x i1> @llvm.loop.dependence.raw.mask.v8i1(ptr %a, ptr %b, i64 2)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res3 = call <4 x i1> @llvm.loop.dependence.raw.mask.v4i1(ptr %a, ptr %b, i64 4)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res4 = call <2 x i1> @llvm.loop.dependence.raw.mask.v2i1(ptr %a, ptr %b, i64 8)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res5 = call <vscale x 16 x i1> @llvm.loop.dependence.raw.mask.nxv16i1(ptr %a, ptr %b, i64 1)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res6 = call <vscale x 8 x i1> @llvm.loop.dependence.raw.mask.nxv8i1(ptr %a, ptr %b, i64 2)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %res7 = call <vscale x 4 x i1> @llvm.loop.dependence.raw.mask.nxv4i1(ptr %a, ptr %b, i64 4)

It's not free (MOV + XTN) to convert from the predicate result of
whilewr/rw to a fixed-length mask, so the cost should be slightly
higher.
@MacDue MacDue force-pushed the tweak_loop_dep_cost branch from 81a15a4 to 4827e6e Compare February 6, 2026 11:17
Copy link
Copy Markdown
Collaborator

@huntergr-arm huntergr-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MacDue MacDue enabled auto-merge (squash) February 9, 2026 09:49
@MacDue MacDue disabled auto-merge February 9, 2026 09:58
@MacDue MacDue enabled auto-merge (squash) February 9, 2026 09:58
@MacDue MacDue disabled auto-merge February 9, 2026 09:58
@MacDue MacDue merged commit 233a991 into llvm:main Feb 9, 2026
10 checks passed
@MacDue MacDue deleted the tweak_loop_dep_cost branch February 9, 2026 09:59
rishabhmadan19 pushed a commit to rishabhmadan19/llvm-project that referenced this pull request Feb 9, 2026
For fixed-length masks we need to AND the result of the whilewr/rw with
`ptrue vl*` (which is at least one more instruction).
Xinlong-Chen pushed a commit to Xinlong-Chen/llvm-project that referenced this pull request Feb 12, 2026
For fixed-length masks we need to AND the result of the whilewr/rw with
`ptrue vl*` (which is at least one more instruction).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 llvm:analysis Includes value tracking, cost tables and constant folding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants