[AMDGPU] Enable i8 GEP promotion for vector allocas #166132

harrisonGPU · 2025-11-03T08:09:53Z

This patch adds support for the pattern:

  %index = select i1 %idx_sel, i32 0, i32 4
  %elt = getelementptr inbounds i8, ptr addrspace(5) %alloca, i32 %index

by scaling the byte offset to an element index (index >> log2(ElemSize)),
allowing the vector element to be updated with insertelement instead of using
scratch memory.

llvmbot · 2025-11-03T08:10:25Z

@llvm/pr-subscribers-backend-amdgpu

Author: Harrison Hao (harrisonGPU)

Changes

This patch adds support for the pattern:

  %elt = getelementptr inbounds i8, ptr addrspace(5) %alloca, i32 %index

by scaling the byte offset to an element index (index >> log2(ElemSize)),
allowing the vector element to be updated with insertelement instead of using
scratch memory.

Full diff: https://github.com/llvm/llvm-project/pull/166132.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+13-2)
(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll (+20)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index ddabd25894414..793c0237cdf38 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -456,10 +456,21 @@ static Value *GEPToVectorIndex(GetElementPtrInst *GEP, AllocaInst *Alloca,
   const auto &VarOffset = VarOffsets.front();
   APInt OffsetQuot;
   APInt::sdivrem(VarOffset.second, VecElemSize, OffsetQuot, Rem);
-  if (Rem != 0 || OffsetQuot.isZero())
-    return nullptr;
+
+  Value *Scaled = nullptr;
+  if (Rem != 0 || OffsetQuot.isZero()) {
+    unsigned ElemSizeShift = Log2_64(VecElemSize);
+    Scaled = Builder.CreateLShr(VarOffset.first, ElemSizeShift);
+    if (Instruction *NewInst = dyn_cast<Instruction>(Scaled))
+      NewInsts.push_back(NewInst);
+    OffsetQuot = APInt(BW, 1);
+    Rem = 0;
+  }
 
   Value *Offset = VarOffset.first;
+  if (Scaled)
+    Offset = Scaled;
+
   auto *OffsetType = dyn_cast<IntegerType>(Offset->getType());
   if (!OffsetType)
     return nullptr;
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll
index 76e1868b3c4b9..65bddaba8dd14 100644
--- a/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll
@@ -250,6 +250,26 @@ bb2:
   store i32 0, ptr addrspace(5) %extractelement
   ret void
 }
+
+define amdgpu_kernel void @scalar_alloca_vector_gep_i8(ptr %buffer, float %data, i32 %index) {
+; CHECK-LABEL: define amdgpu_kernel void @scalar_alloca_vector_gep_i8(
+; CHECK-SAME: ptr [[BUFFER:%.*]], float [[DATA:%.*]], i32 [[INDEX:%.*]]) {
+; CHECK-NEXT:    [[ALLOCA:%.*]] = freeze <3 x float> poison
+; CHECK-NEXT:    [[VEC:%.*]] = load <3 x float>, ptr [[BUFFER]], align 16
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 [[INDEX]], 2
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <3 x float> [[VEC]], float [[DATA]], i32 [[TMP1]]
+; CHECK-NEXT:    store <3 x float> [[TMP2]], ptr [[BUFFER]], align 16
+; CHECK-NEXT:    ret void
+;
+  %alloca = alloca <3 x float>, align 16, addrspace(5)
+  %vec = load <3 x float>, ptr %buffer
+  store <3 x float> %vec, ptr addrspace(5) %alloca
+  %elt = getelementptr inbounds nuw i8, ptr addrspace(5) %alloca, i32 %index
+  store float %data, ptr addrspace(5) %elt, align 4
+  %updated = load <3 x float>, ptr addrspace(5) %alloca, align 16
+  store <3 x float> %updated, ptr %buffer, align 16
+  ret void
+}
 ;.
 ; CHECK: [[META0]] = !{}
 ; CHECK: [[RNG1]] = !{i32 0, i32 1025}

shiltian

What if the index here is somewhere in the middle? It doesn't look like we have any constraint on the index.

arsenm · 2025-11-04T04:25:51Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

+
+  Value *Scaled = nullptr;
+  if (Rem != 0 || OffsetQuot.isZero()) {
+    unsigned ElemSizeShift = Log2_64(VecElemSize);


Need to validate VecElemSize is a power of 2?

Thanks, but I think it is not necessary to explicitly check whether the element size is a power of two, because it is already covered by the existing check here:

llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

Lines 871 to 877 in 52fdcf9

Type *VecEltTy = VectorTy->getElementType();

unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy);

if (ElementSizeInBits != DL->getTypeAllocSizeInBits(VecEltTy)) {

LLVM_DEBUG(dbgs() << " Cannot convert to vector if the allocation size "

"does not match the type's size\n");

return false;

}

If the element type is not naturally aligned, it will return false, which also rejects non power of 2 element sizes, such as i24.

arsenm · 2025-11-04T04:26:03Z

llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll

+  %updated = load <3 x float>, ptr addrspace(5) %alloca, align 16
+  store <3 x float> %updated, ptr %buffer, align 16
+  ret void
+}


Test with non-power-of-2 element size

harrisonGPU · 2025-11-04T09:40:36Z

What if the index here is somewhere in the middle? It doesn't look like we have any constraint on the index.

Could you please give me an example?

shiltian · 2025-11-04T16:00:24Z

Could you please give me an example?

In the test case, the %index is for %alloca of <3 x float>, which is 12-byte, each of which has 4-byte. Since the GEP is of type i8, what if %index is 5, which will be somewhere in the middle of the 2nd element of %alloca.

harrisonGPU · 2025-11-10T08:28:04Z

Could you please give me an example?

In the test case, the %index is for %alloca of <3 x float>, which is 12-byte, each of which has 4-byte. Since the GEP is of type i8, what if %index is 5, which will be somewhere in the middle of the 2nd element of %alloca.

Hi @shiltian , I’ve already thought about this issue, thank you very much for your suggestion and for pointing it out.
Now I think we should only promote when the variable index is guaranteed to be aligned to the element size.
We can use computeKnownBits and countMinTrailingZeros to check that the lower bits of the index are zero, which verifies its alignment before promoting.
I’ve updated the lit test and commit message accordingly. What do you think?

shiltian · 2025-11-18T18:22:51Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

-    return nullptr;
-
  Value *Offset = VarOffset.first;
+  if (Rem != 0 || OffsetQuot.isZero()) {


Does it have to be this complicated? I thought checking whether offset % size would be sufficient?

Thanks, I agree your points, I have removed it.

github-actions · 2025-11-20T03:31:12Z

🐧 Linux x64 Test Results

187096 tests passed
4929 tests skipped

✅ The build succeeded and all tests passed.

shiltian · 2025-11-21T20:39:10Z

The logic seems reasonable for plain vector, but I'm not sure about a nested vector. Can you add tests as well?

harrisonGPU · 2025-11-24T06:55:30Z

The logic seems reasonable for plain vector, but I'm not sure about a nested vector. Can you add tests as well?

Thanks, I have added two tests about nested vector.

…element

llvm-ci · 2025-12-08T04:24:05Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 10 "Add check check-libc-amdgcn-amd-amdhsa".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/18635

Here is the relevant piece of the build log for the reference

Step 10 (Add check check-libc-amdgcn-amd-amdhsa) failure: test (failure)
...
[2611/3298] Running hermetic test libc.test.src.math.smoke.lroundf_test.__hermetic__
[==========] Running 3 tests from 1 test suite.
[ RUN      ] LlvmLibcRoundToIntegerTest.InfinityAndNaN
[       OK ] LlvmLibcRoundToIntegerTest.InfinityAndNaN (10 us)
[ RUN      ] LlvmLibcRoundToIntegerTest.RoundNumbers
[       OK ] LlvmLibcRoundToIntegerTest.RoundNumbers (6 us)
[ RUN      ] LlvmLibcRoundToIntegerTest.SubnormalRange
[       OK ] LlvmLibcRoundToIntegerTest.SubnormalRange (789 us)
Ran 3 tests.  PASS: 3  FAIL: 0
[2612/3298] Linking CXX executable libc/test/src/stdlib/libc.test.src.stdlib.heap_sort_test.__hermetic__.__build__
FAILED: libc/test/src/stdlib/libc.test.src.stdlib.heap_sort_test.__hermetic__.__build__ 
: && /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang++ --target=amdgcn-amd-amdhsa -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -Wl,--color-diagnostics    --target=amdgcn-amd-amdhsa -Wno-multi-gpu -mcpu=native -flto -Wl,-mllvm,-amdgpu-lower-global-ctor-dtor=0 -nostdlib -static -Wl,-mllvm,-amdhsa-code-object-version=6 libc/startup/gpu/amdgpu/CMakeFiles/libc.startup.gpu.amdgpu.crt1.dir/start.cpp.o libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__hermetic__.__build__.dir/heap_sort_test.cpp.o -o libc/test/src/stdlib/libc.test.src.stdlib.heap_sort_test.__hermetic__.__build__  libc/test/UnitTest/libLibcTest.hermetic.a  libc/test/UnitTest/libLibcHermeticTestSupport.hermetic.a  libc/test/src/stdlib/liblibc.test.src.stdlib.heap_sort_test.__hermetic__.libc.a && :
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
Stack dump:
0.	Program arguments: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/ld.lld --no-undefined -shared -plugin-opt=mcpu=gfx90a -plugin-opt=O3 --lto-CGO3 -plugin-opt=-function-sections=1 -plugin-opt=-data-sections=1 -L/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/amdgcn-amd-amdhsa --color-diagnostics -mllvm -amdgpu-lower-global-ctor-dtor=0 -mllvm -amdhsa-code-object-version=6 libc/startup/gpu/amdgpu/CMakeFiles/libc.startup.gpu.amdgpu.crt1.dir/start.cpp.o libc/test/src/stdlib/CMakeFiles/libc.test.src.stdlib.heap_sort_test.__hermetic__.__build__.dir/heap_sort_test.cpp.o libc/test/UnitTest/libLibcTest.hermetic.a libc/test/UnitTest/libLibcHermeticTestSupport.hermetic.a libc/test/src/stdlib/liblibc.test.src.stdlib.heap_sort_test.__hermetic__.libc.a -o libc/test/src/stdlib/libc.test.src.stdlib.heap_sort_test.__hermetic__.__build__
1.	Running pass 'Function Pass Manager' on module 'ld-temp.o'.
2.	Running pass 'AMDGPU Promote Alloca' on function '@_ZN49LlvmLibcHeapSortTest_SameElementThreeElementArray3RunEv'
 #0 0x00007bd51120e100 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMSupport.so.22.0git+0x20e100)
 #1 0x00007bd51120adbf llvm::sys::RunSignalHandlers() (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/libLLVMSupport.so.22.0git+0x20adbf)
 #2 0x00007bd51120af12 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007bd510842520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007bd50c854df1 llvm::Instruction::eraseFromParent() (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/../lib/libLLVMCore.so.22.0git+0x254df1)
 #5 0x00007bd50ef80f7d (anonymous namespace)::AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(llvm::AllocaInst&)::'lambda'(llvm::Instruction*, llvm::Twine)::operator()(llvm::Instruction*, llvm::Twine) const AMDGPUPromoteAlloca.cpp:0:0
 #6 0x00007bd50ef89288 (anonymous namespace)::AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(llvm::AllocaInst&) AMDGPUPromoteAlloca.cpp:0:0
 #7 0x00007bd50ef8bb21 (anonymous namespace)::AMDGPUPromoteAllocaImpl::run(llvm::Function&, bool) (.part.0) AMDGPUPromoteAlloca.cpp:0:0
 #8 0x00007bd50ef8dc33 (anonymous namespace)::AMDGPUPromoteAlloca::runOnFunction(llvm::Function&) AMDGPUPromoteAlloca.cpp:0:0
 #9 0x00007bd50c8a4e93 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/../lib/libLLVMCore.so.22.0git+0x2a4e93)
#10 0x00007bd50c8a52e9 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/../lib/libLLVMCore.so.22.0git+0x2a52e9)
#11 0x00007bd50c8a5c37 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/../lib/libLLVMCore.so.22.0git+0x2a5c37)
#12 0x00007bd510ebeb29 codegen(llvm::lto::Config const&, llvm::TargetMachine*, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex const&) LTOBackend.cpp:0:0
#13 0x00007bd510ec09ad llvm::lto::backend(llvm::lto::Config const&, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/../lib/libLLVMLTO.so.22.0git+0x569ad)
#14 0x00007bd510ead619 llvm::lto::LTO::runRegularLTO(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/../lib/libLLVMLTO.so.22.0git+0x43619)
#15 0x00007bd510eb2df2 llvm::lto::LTO::run(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::FileCache) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/../lib/libLLVMLTO.so.22.0git+0x48df2)
#16 0x00007bd5117ad8fc lld::elf::BitcodeCompiler::compile() (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/liblldELF.so.22.0git+0x1ad8fc)
#17 0x00007bd5116fca0d void lld::elf::LinkerDriver::compileBitcodeFiles<llvm::object::ELFType<(llvm::endianness)1, true>>(bool) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/liblldELF.so.22.0git+0xfca0d)
#18 0x00007bd511720e24 void lld::elf::LinkerDriver::link<llvm::object::ELFType<(llvm::endianness)1, true>>(llvm::opt::InputArgList&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/liblldELF.so.22.0git+0x120e24)
#19 0x00007bd511728ca9 lld::elf::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/liblldELF.so.22.0git+0x128ca9)
#20 0x00007bd5117292cc lld::elf::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/liblldELF.so.22.0git+0x1292cc)
#21 0x00007bd5119e35e1 lld::unsafeLldMain(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, llvm::ArrayRef<lld::DriverDef>, bool) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/../lib/liblldCommon.so.22.0git+0xe5e1)
#22 0x00005e1ec1509d16 lld_main(int, char**, llvm::ToolContext const&) (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/ld.lld+0x2d16)
#23 0x00005e1ec150955b main (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/ld.lld+0x255b)
#24 0x00007bd510829d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#25 0x00007bd510829e40 call_init ./csu/../csu/libc-start.c:128:20
#26 0x00007bd510829e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#27 0x00005e1ec15095b5 _start (/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/bin/ld.lld+0x25b5)
clang++: error: unable to execute command: Segmentation fault (core dumped)
clang++: error: ld.lld command failed due to signal (use -v to see invocation)
[2613/3298] Linking CXX executable libc/test/src/math/smoke/libc.test.src.math.smoke.sin_test.__hermetic__.__build__
[2614/3298] Running hermetic test libc.test.src.math.smoke.fminimumf_test.__hermetic__

llvm-ci · 2025-12-08T04:51:13Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux running on sanitizer-buildbot1 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/23325

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[229/235] Generating MSAN_INST_GTEST.gtest-all.cc.x86_64-with-call.o
[230/235] Generating MSAN_INST_GTEST.gtest-all.cc.x86_64.o
[231/235] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64-with-call.o
[232/235] Generating Msan-x86_64-with-call-Test
[233/235] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64.o
[234/235] Generating Msan-x86_64-Test
[234/235] Running compiler_rt regression tests
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 6148 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: XRay-x86_64-linux :: TestCases/Posix/basic-filtering.cpp (5807 of 6148)
******************** TEST 'XRay-x86_64-linux :: TestCases/Posix/basic-filtering.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 4
/home/b/sanitizer-x86_64-linux/build/build_default/bin/clang  --driver-mode=g++ -fxray-instrument  -m64 -nobuiltininc -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/include -idirafter /home/b/sanitizer-x86_64-linux/build/build_default/lib/clang/22/include -resource-dir=/home/b/sanitizer-x86_64-linux/build/compiler_rt_build -Wl,-rpath,/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/lib/linux   -std=c++11 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp -o /home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp -g
# executed command: /home/b/sanitizer-x86_64-linux/build/build_default/bin/clang --driver-mode=g++ -fxray-instrument -m64 -nobuiltininc -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/include -idirafter /home/b/sanitizer-x86_64-linux/build/build_default/lib/clang/22/include -resource-dir=/home/b/sanitizer-x86_64-linux/build/compiler_rt_build -Wl,-rpath,/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/lib/linux -std=c++11 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp -o /home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp -g
# note: command had no output on stdout or stderr
# RUN: at line 5
rm -f basic-filtering-*
# executed command: rm -f 'basic-filtering-*'
# note: command had no output on stdout or stderr
# RUN: at line 6
env XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1      xray_logfile_base=basic-filtering-      xray_naive_log_func_duration_threshold_us=1000      xray_naive_log_max_stack_depth=2"  /home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp 2>&1 |      FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp
# executed command: env 'XRAY_OPTIONS=patch_premain=true xray_mode=xray-basic verbosity=1      xray_logfile_base=basic-filtering-      xray_naive_log_func_duration_threshold_us=1000      xray_naive_log_max_stack_depth=2' /home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp
# note: command had no output on stdout or stderr
# RUN: at line 11
ls basic-filtering-* | head -1 | tr -d '\n' > /home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp.log
# executed command: ls 'basic-filtering-*'
# note: command had no output on stdout or stderr
# executed command: head -1
# note: command had no output on stdout or stderr
# executed command: tr -d '\n'
# note: command had no output on stdout or stderr
# RUN: at line 12
/home/b/sanitizer-x86_64-linux/build/build_default/./bin/llvm-xray convert --symbolize --output-format=yaml -instr_map=/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp      "/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/basic-filtering-basic-filtering.cpp.tmp.RqkfRk" |      FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp --check-prefix TRACE
# executed command: /home/b/sanitizer-x86_64-linux/build/build_default/./bin/llvm-xray convert --symbolize --output-format=yaml -instr_map=/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp '%{readfile:/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp.log}'
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp --check-prefix TRACE
# note: command had no output on stdout or stderr
# RUN: at line 15
rm -f basic-filtering-*
# executed command: rm -f 'basic-filtering-*'
# note: command had no output on stdout or stderr
# RUN: at line 18
Step 16 (test standalone compiler-rt) failure: test standalone compiler-rt (failure)
...
[229/235] Generating MSAN_INST_GTEST.gtest-all.cc.x86_64-with-call.o
[230/235] Generating MSAN_INST_GTEST.gtest-all.cc.x86_64.o
[231/235] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64-with-call.o
[232/235] Generating Msan-x86_64-with-call-Test
[233/235] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.x86_64.o
[234/235] Generating Msan-x86_64-Test
[234/235] Running compiler_rt regression tests
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 6148 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: XRay-x86_64-linux :: TestCases/Posix/basic-filtering.cpp (5807 of 6148)
******************** TEST 'XRay-x86_64-linux :: TestCases/Posix/basic-filtering.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 4
/home/b/sanitizer-x86_64-linux/build/build_default/bin/clang  --driver-mode=g++ -fxray-instrument  -m64 -nobuiltininc -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/include -idirafter /home/b/sanitizer-x86_64-linux/build/build_default/lib/clang/22/include -resource-dir=/home/b/sanitizer-x86_64-linux/build/compiler_rt_build -Wl,-rpath,/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/lib/linux   -std=c++11 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp -o /home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp -g
# executed command: /home/b/sanitizer-x86_64-linux/build/build_default/bin/clang --driver-mode=g++ -fxray-instrument -m64 -nobuiltininc -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/include -idirafter /home/b/sanitizer-x86_64-linux/build/build_default/lib/clang/22/include -resource-dir=/home/b/sanitizer-x86_64-linux/build/compiler_rt_build -Wl,-rpath,/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/lib/linux -std=c++11 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp -o /home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp -g
# note: command had no output on stdout or stderr
# RUN: at line 5
rm -f basic-filtering-*
# executed command: rm -f 'basic-filtering-*'
# note: command had no output on stdout or stderr
# RUN: at line 6
env XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1      xray_logfile_base=basic-filtering-      xray_naive_log_func_duration_threshold_us=1000      xray_naive_log_max_stack_depth=2"  /home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp 2>&1 |      FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp
# executed command: env 'XRAY_OPTIONS=patch_premain=true xray_mode=xray-basic verbosity=1      xray_logfile_base=basic-filtering-      xray_naive_log_func_duration_threshold_us=1000      xray_naive_log_max_stack_depth=2' /home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp
# note: command had no output on stdout or stderr
# RUN: at line 11
ls basic-filtering-* | head -1 | tr -d '\n' > /home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp.log
# executed command: ls 'basic-filtering-*'
# note: command had no output on stdout or stderr
# executed command: head -1
# note: command had no output on stdout or stderr
# executed command: tr -d '\n'
# note: command had no output on stdout or stderr
# RUN: at line 12
/home/b/sanitizer-x86_64-linux/build/build_default/./bin/llvm-xray convert --symbolize --output-format=yaml -instr_map=/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp      "/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/basic-filtering-basic-filtering.cpp.tmp.RqkfRk" |      FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp --check-prefix TRACE
# executed command: /home/b/sanitizer-x86_64-linux/build/build_default/./bin/llvm-xray convert --symbolize --output-format=yaml -instr_map=/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp '%{readfile:/home/b/sanitizer-x86_64-linux/build/compiler_rt_build/test/xray/X86_64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp.log}'
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp --check-prefix TRACE
# note: command had no output on stdout or stderr
# RUN: at line 15
rm -f basic-filtering-*
# executed command: rm -f 'basic-filtering-*'
# note: command had no output on stdout or stderr
# RUN: at line 18

This reverts commit 6ec8c43.

Reverts #166132 Broke libc on GPU tests. https://lab.llvm.org/buildbot/#/builders/10/builds/18635

…s" (#171087) Reverts llvm/llvm-project#166132 Broke libc on GPU tests. https://lab.llvm.org/buildbot/#/builders/10/builds/18635

This patch adds support for the pattern: ```llvm %index = select i1 %idx_sel, i32 0, i32 4 %elt = getelementptr inbounds i8, ptr addrspace(5) %alloca, i32 %index ``` by scaling the byte offset to an element index (index >> log2(ElemSize)), allowing the vector element to be updated with insertelement instead of using scratch memory.

…1087) Reverts llvm#166132 Broke libc on GPU tests. https://lab.llvm.org/buildbot/#/builders/10/builds/18635

ruiling · 2025-12-10T12:48:40Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

+      NewInsts.push_back(NewInst);
+
+    Offset = Scaled;
+    OffsetQuot = APInt(BW, 1);


Sorry look into this very late. This is wrong. The case we want to optimize is when (VarOffset.first * VarOffset.second) % VecElemSize == 0, but VarOffset.second % VecElemSize != 0. To calculate the vector index, you need (VarOffset.first * VarOffset.second / VecElemSize). Here you reset OffsetQuot to one. So you were actually dropping the VarOffset.second. I can see the change here will have conflict with #170512. As the code below was mostly moved away. And the argument NewInsts was also removed. I would like we do some further refactor based on #170512. Rename the function GEPToVectorIndex to isPtrOffsetAlignedToElementSize(). And just return the three components if the offset is properly aligned. And do vector index calculation later.

ruiling · 2025-12-10T12:51:13Z

llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll

+  %vec = load <3 x float>, ptr %buffer
+  store <3 x float> %vec, ptr addrspace(5) %alloca
+  %index = select i1 %idx_sel, i32 4, i32 8
+  %elt = getelementptr inbounds nuw i8, ptr addrspace(5) %alloca, i32 %index


It would be better you change this to gep of i16, so we have a case the multiplier of the variable offset part is not 1.

ruiling · 2025-12-10T13:13:54Z

llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll

+  ret void
+}
+
+define amdgpu_kernel void @scalar_alloca_vector_gep_i8_4_or_5_no_promote(ptr %buffer, float %data, i1 %idx_sel) {


Maybe switch to another calling convention so that the alloca will be kept unchanged (not being promoted to LDS)?

harrisonGPU requested review from arsenm, jayfoad, perlfu, ritter-x2a, ruiling and shiltian November 3, 2025 08:09

harrisonGPU self-assigned this Nov 3, 2025

llvmbot added the backend:AMDGPU label Nov 3, 2025

shiltian reviewed Nov 3, 2025

View reviewed changes

arsenm reviewed Nov 4, 2025

View reviewed changes

harrisonGPU force-pushed the amdgpu/promote-vector branch from c1439a3 to 6a28740 Compare November 10, 2025 08:22

shiltian reviewed Nov 18, 2025

View reviewed changes

shiltian approved these changes Dec 8, 2025

View reviewed changes

harrisonGPU added 5 commits December 8, 2025 11:21

[AMDGPU] Enable i8 GEP promotion for vector allocas

478545f

[AMDGPU] Use computeKnownBits to check if it points the middle of an …

b2d58fa

…element

Remove unnecessary check.

5721cd6

Add nested vector test.

94f1dc6

Add nested vector test again.

0e8c3fe

harrisonGPU force-pushed the amdgpu/promote-vector branch from 8c3f2e3 to 0e8c3fe Compare December 8, 2025 03:26

harrisonGPU merged commit 6ec8c43 into llvm:main Dec 8, 2025
10 checks passed

harrisonGPU deleted the amdgpu/promote-vector branch December 8, 2025 04:13

jplehr added a commit that referenced this pull request Dec 8, 2025

Revert "[AMDGPU] Enable i8 GEP promotion for vector allocas (#166132)"

75437ec

This reverts commit 6ec8c43.

jplehr mentioned this pull request Dec 8, 2025

Revert "[AMDGPU] Enable i8 GEP promotion for vector allocas" #171087

Merged

jplehr added a commit that referenced this pull request Dec 8, 2025

Revert "[AMDGPU] Enable i8 GEP promotion for vector allocas" (#171087)

ec78750

Reverts #166132 Broke libc on GPU tests. https://lab.llvm.org/buildbot/#/builders/10/builds/18635

harrisonGPU restored the amdgpu/promote-vector branch December 9, 2025 02:43

honeygoyal pushed a commit to honeygoyal/llvm-project that referenced this pull request Dec 9, 2025

Revert "[AMDGPU] Enable i8 GEP promotion for vector allocas" (llvm#17…

0e33e6b

…1087) Reverts llvm#166132 Broke libc on GPU tests. https://lab.llvm.org/buildbot/#/builders/10/builds/18635

ruiling reviewed Dec 10, 2025

View reviewed changes

	Type *VecEltTy = VectorTy->getElementType();
	unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy);
	if (ElementSizeInBits != DL->getTypeAllocSizeInBits(VecEltTy)) {
	LLVM_DEBUG(dbgs() << " Cannot convert to vector if the allocation size "
	"does not match the type's size\n");
	return false;
	}

[AMDGPU] Enable i8 GEP promotion for vector allocas #166132

[AMDGPU] Enable i8 GEP promotion for vector allocas #166132

Uh oh!

Conversation

harrisonGPU commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 3, 2025

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harrisonGPU commented Nov 4, 2025

Uh oh!

shiltian commented Nov 4, 2025

Uh oh!

harrisonGPU commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

shiltian commented Nov 21, 2025

Uh oh!

harrisonGPU commented Nov 24, 2025

Uh oh!

Uh oh!

llvm-ci commented Dec 8, 2025

Uh oh!

llvm-ci commented Dec 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

harrisonGPU commented Nov 3, 2025 •

edited

Loading

harrisonGPU commented Nov 10, 2025 •

edited

Loading

github-actions bot commented Nov 20, 2025 •

edited

Loading