Skip to content

[Analysis] Only fold trunc X to (X & Mask) if Mask == getLowBitsSet(bit width of original operands)#176589

Closed
wermos wants to merge 1 commit intollvm:mainfrom
wermos:trunc-2
Closed

[Analysis] Only fold trunc X to (X & Mask) if Mask == getLowBitsSet(bit width of original operands)#176589
wermos wants to merge 1 commit intollvm:mainfrom
wermos:trunc-2

Conversation

@wermos
Copy link
Copy Markdown
Contributor

@wermos wermos commented Jan 17, 2026

Implements the refinement mentioned in #171195 (review).

We only perform the fold if the resultant mask is equal to getLowBitsSet(bit width of original operands).

@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Jan 17, 2026
@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Jan 17, 2026

@llvm/pr-subscribers-llvm-analysis

Author: Tirthankar Mazumder (wermos)

Changes

Implements the refinement mentioned in #171195 (review).

We only perform the fold if the resultant mask is equal to getLowBitsSet(bit width of original operands).


Full diff: https://github.com/llvm/llvm-project/pull/176589.diff

1 Files Affected:

  • (modified) llvm/lib/Analysis/CmpInstAnalysis.cpp (+11-3)
diff --git a/llvm/lib/Analysis/CmpInstAnalysis.cpp b/llvm/lib/Analysis/CmpInstAnalysis.cpp
index 880006c0fcfac..5e434f3e711fe 100644
--- a/llvm/lib/Analysis/CmpInstAnalysis.cpp
+++ b/llvm/lib/Analysis/CmpInstAnalysis.cpp
@@ -12,6 +12,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Analysis/CmpInstAnalysis.h"
+#include "llvm/ADT/APInt.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/PatternMatch.h"
@@ -164,9 +165,16 @@ llvm::decomposeBitTestICmp(Value *LHS, Value *RHS, CmpInst::Predicate Pred,
 
     // Try to convert (trunc X) eq/ne C into (X & Mask) eq/ne C
     if (LookThroughTrunc && isa<TruncInst>(LHS)) {
-      Result.Pred = Pred;
-      Result.Mask = APInt::getAllOnes(C.getBitWidth());
-      Result.C = C;
+      auto *TI = dyn_cast<TruncInst>(LHS);
+      unsigned SrcBW = TI->getSrcTy()->getScalarSizeInBits(),
+               DstBW = TI->getDestTy()->getScalarSizeInBits();
+      APInt DesiredMask = APInt::getLowBitsSet(SrcBW, DstBW);
+      APInt Mask = APInt::getAllOnes(C.getBitWidth()).zext(SrcBW);
+      if (Mask == DesiredMask) {
+        Result.Pred = Pred;
+        Result.Mask = Mask;
+        Result.C = C;
+      }
       break;
     }
 

@wermos wermos changed the title Only fold trunc X to (X & Mask) if Mask == getLowBitsSet(bit width of original operands) [Analysis] Only fold trunc X to (X & Mask) if Mask == getLowBitsSet(bit width of original operands) Jan 17, 2026
@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Jan 17, 2026

@dtcxzyw I'm not exactly sure how #171195 (comment) is supposed to be handled. Do you have any suggestions?

Also, I think we can run llvm-opt-benchmark on this and see the net effect like you had suggested in #171195.

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Jan 17, 2026

@andjo403 In #171195 you suggested writing the Mask as follows:

// Try to convert (trunc X) eq/ne C into (X & Mask) eq/ne C
if (LookThroughTrunc && isa<TruncInst>(LHS)) {
Result.Pred = Pred;
Result.Mask = APInt::getAllOnes(C.getBitWidth());
Result.C = C;
break;
}

but I'm not sure if this is correct. Imagine we have a snippet like

%v1 = trunc i64 %x to i32
%v2 = icmp eq i32 %v1, 0

The Result.X will be the larger bitwidth (64 in the example) and we have a Result.Mask that has the smaller, truncated bitwidth (32 in the example).

I think the way it was written before is more correct, because the mask needs to be zext'd to X's bitwidth. Similarly, the constant C needs to zext'd to X's bitwidth.

@andjo403
Copy link
Copy Markdown
Contributor

Sorry I do not understand this change as it will compare two numbers that are always equal.
The zext is already done in the legacy code at

if (LookThroughTrunc && match(LHS, m_Trunc(m_Value(X)))) {
Result.X = X;
Result.Mask = Result.Mask.zext(X->getType()->getScalarSizeInBits());
Result.C = Result.C.zext(X->getType()->getScalarSizeInBits());
} else {
Result.X = LHS;
}
that was why I suggested the change to avoid doing zext from and to the same type.

for your example the APInt::getAllOnes(C.getBitWidth()) will be a i32 with all bits set to one and that is then zext to a i64 with 32bits set at the code from the link above.

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Jan 18, 2026

Ah, I see. Your suggestion makes sense to me now.

Sorry I do not understand this change as it will compare two numbers that are always equal.

What you're saying makes sense. Did I misinterpret the comment in #171195 (review)?

LGTM. However, as I commented in #171195 (comment), the current implementation converts (trunc X) ==/!= C to (and X, mask) ==/!= 0 unconditionally and causes some irrevertible regressions. As a follow-up, we can try to preserve previous behavior by checking if the mask is getLowBitsSet(bit width of original operands). Then we can check the net effect.

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Jan 19, 2026

@dtcxzyw could you shed some light on this matter?

@dtcxzyw
Copy link
Copy Markdown
Member

dtcxzyw commented Jan 19, 2026

I mean changing the code here:

// This matches patterns corresponding to tests of the signbit as well as:
// (trunc X) pred C2 --> (X & Mask) == C
if (auto Res = decomposeBitTestICmp(Op0, Op1, Pred, /*LookThroughTrunc=*/true,
/*AllowNonZeroC=*/true)) {
Value *And = Builder.CreateAnd(Res->X, Res->Mask);
Constant *C = ConstantInt::get(Res->X->getType(), Res->C);
return new ICmpInst(Res->Pred, And, C);
}

This should allow patterns like (trunc i32 %x to i16) u> 15 -> (and %x, 65520) == 0 but not (trunc i32 %x to i16) == 15 -> (and i32 %x, 65535) == 15.

However, I double checked and found that it didn't happen, since it was already guarded by the following check (DecomposeAnd is false by default):

if ((ICmpInst::isEquality(Pred) && !DecomposeAnd) ||
!match(RHS, m_APIntAllowPoison(OrigC)))
return std::nullopt;

We need more time to investigate the root cause of regressions, as demonstrated in https://github.com/dtcxzyw/llvm-opt-benchmark/pull/3316/files (See bench/llvm/optimized/SemaType.ll, bench/libigl/optimized/bijective_composite_harmonic_mapping.ll, and bench/cpython/optimized/_ssl.ll). With #171195, new masking instructions are introduced in multiple IR files. So I guess there are some optimizations converting truncs into ands unconditionally.

@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Jan 20, 2026

Thanks for the reply. I'm going to mark this PR as a draft for now, while I try to investigate the root cause of those regressions

@wermos wermos marked this pull request as draft January 20, 2026 06:30
@wermos
Copy link
Copy Markdown
Contributor Author

wermos commented Feb 15, 2026

I'm closing this PR for now. I'll open a new one when I get around to investigating the regressions.

@wermos wermos closed this Feb 15, 2026
@wermos wermos deleted the trunc-2 branch February 17, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants