Automerge conflict: [SROA] Refactor rewritePartition alloca type selection process (#167771) by llvm-upstream-sync[bot] · Pull Request #3 · sriyalamar/cpullvm-toolchain

llvm-upstream-sync · 2026-04-08T18:02:31Z

This PR does two things:

Refactor the rewritePartition type selection process to move the
logic inside of a lambda. Previously the selection process would make
use of a mutable SliceTy. Each phase of the type selection would do
something like if (!SliceTy) { // try to set sliceTy }. But you also
have if (!SliceTy && <condition>) and if (!SliceTy || <condition>).
I think this style makes the priority mechanism confusing. The new way I
wrote the selection process is equivalent (except for the second
contribution of this PR), and works by checking a condition, followed by
returning the selected type right away. I think it makes the priority
clearer.
What motivated the rewrite is that there are some cases with small
aggregate allocas that have mixed type loads and stores that SROA fails
to promote.

For example, given:

%alloca = alloca [2 x half]
store i32 42, ptr %alloca
%val = load float, ptr %alloca

Previously, SROA would:

Find no common type between i32 and float
Use getTypePartition which returns [2 x half]
Create a new alloca [2 x half]

This PR adds an additional check:

if (LargestIntTy &&
    DL.getTypeAllocSize(LargestIntTy).getFixedValue() >= P.size() &&
    isIntegerWideningViable(P, LargestIntTy, DL))
  return {LargestIntTy, true, nullptr};

which allows the alloca above to get promoted as i32.

The larger rewrite helps with this because this check is super specific.
I only want it to apply when there is no common type and when all the
loads and stores can be widened for promotion. I also want to apply it
on the subtype returned from getTypePartition. The larger rewrite makes
it very clear when this optimization occurs. I also don't want it to
apply when vector promotion is possible.

After this change, the example is optimized to:

%0 = bitcast i32 42 to float
ret void

The alloca is completely eliminated.

This PR does two things: 1. Refactor the rewritePartition type selection process to move the logic inside of a lambda. Previously the selection process would make use of a mutable `SliceTy`. Each phase of the type selection would do something like `if (!SliceTy) { // try to set sliceTy` }. But you also have `if (!SliceTy && <condition>)` and `if (!SliceTy || <condition>)`. I think this style makes the priority mechanism confusing. The new way I wrote the selection process is equivalent (except for the second contribution of this PR), and works by checking a condition, followed by returning the selected type right away. I think it makes the priority clearer. 2. What motivated the rewrite is that there are some cases with small aggregate allocas that have mixed type loads and stores that SROA fails to promote. For example, given: ``` %alloca = alloca [2 x half] store i32 42, ptr %alloca %val = load float, ptr %alloca ``` Previously, SROA would: - Find no common type between `i32` and `float` - Use `getTypePartition` which returns `[2 x half]` - Create a new `alloca [2 x half]` This PR adds an additional check: ``` if (LargestIntTy && DL.getTypeAllocSize(LargestIntTy).getFixedValue() >= P.size() && isIntegerWideningViable(P, LargestIntTy, DL)) return {LargestIntTy, true, nullptr}; ``` which allows the alloca above to get promoted as `i32`. The larger rewrite helps with this because this check is super specific. I only want it to apply when there is no common type and when all the loads and stores can be widened for promotion. I also want to apply it on the subtype returned from getTypePartition. The larger rewrite makes it very clear when this optimization occurs. I also don't want it to apply when vector promotion is possible. After this change, the example is optimized to: ``` %0 = bitcast i32 42 to float ret void ``` The alloca is completely eliminated.

Running gcc test c-c++-common/tsan/tls_race.c on s390 we get: ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:618 "((thr_beg)) >= ((tls_addr))" (0x3ffaa35e140, 0x3ffaa35e250) (tid=2419930) #0 __tsan::CheckUnwind() /devel/src/libsanitizer/tsan/tsan_rtl.cpp:696 (libtsan.so.2+0x91b57) #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /devel/src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:86 (libtsan.so.2+0xd211b) #2 __tsan::ImitateTlsWrite(__tsan::ThreadState*, unsigned long, unsigned long) /devel/src/libsanitizer/tsan/tsan_platform_linux.cpp:618 (libtsan.so.2+0x8faa3) #3 __tsan::ThreadStart(__tsan::ThreadState*, unsigned int, unsigned long long, __sanitizer::ThreadType) /devel/src/libsanitizer/tsan/tsan_rtl_thread.cpp:225 (libtsan.so.2+0xaadb5) #4 __tsan_thread_start_func /devel/src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1065 (libtsan.so.2+0x3d34d) #5 start_thread <null> (libc.so.6+0xae70d) (BuildId: d3b08de1b543c2d15d419bf861b3c2e4c01ac75b) #6 thread_start <null> (libc.so.6+0x12d2ff) (BuildId: d3b08de1b543c2d15d419bf861b3c2e4c01ac75b) In order to determine the static TLS blocks in GetStaticTlsBoundary we iterate over the modules and try to find the largest range without a gap. Here we might have that modules are spaced exactly by the alignment. For example, for the failing test we have: (gdb) p/x ranges.data_[0] $1 = {begin = 0x3fff7f9e6b8, end = 0x3fff7f9e740, align = 0x8, tls_modid = 0x3} (gdb) p/x ranges.data_[1] $2 = {begin = 0x3fff7f9e740, end = 0x3fff7f9eed0, align = 0x40, tls_modid = 0x2} (gdb) p/x ranges.data_[2] $3 = {begin = 0x3fff7f9eed8, end = 0x3fff7f9eef8, align = 0x8, tls_modid = 0x4} (gdb) p/x ranges.data_[3] $4 = {begin = 0x3fff7f9eefc, end = 0x3fff7f9ef00, align = 0x4, tls_modid = 0x1} where ranges[3].begin == ranges[2].end + ranges[3].align holds. Since in the loop a strict inequality test is used we compute the wrong address (gdb) p/x *addr $5 = 0x3fff7f9eefc whereas 0x3fff7f9e6b8 is expected which is why we bail out in the subsequent.

Signed-off-by: sriyalamar <syalamar@qti.qualcomm.com>

…e edge case (#188590) llvm/llvm-project#186966 was reverted because the test case triggered a use-of-uninitialized-memory (https://lab.llvm.org/buildbot/#/builders/94/builds/16379), due to the include directive omitting a trailing newline. This patch adds a minor fix to avoid the use-of-uninitialized-memory, and deliberately re-adds the test case sans trailing newline for regression testing. MSan report prior to this fix: ``` @@@BUILD_STEP sanitizer logs: stage2/msan_track_origins check@@@ ==clang-scan-deps==616960==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x5555599c3300 in isAnnotation /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/Lex/Token.h:131:38 #1 0x5555599c3300 in setLength /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/Lex/Token.h:152:13 #2 0x5555599c3300 in clang::Lexer::FormTokenWithChars(clang::Token&, char const*, clang::tok::TokenKind) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/Lex/Lexer.h:644:12 #3 0x5555599cf895 in clang::Lexer::LexEndOfFile(clang::Token&, char const*) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/lib/Lex/Lexer.cpp:3166:5 #4 0x555559bb229b in clang::Preprocessor::Lex(clang::Token&) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/lib/Lex/Preprocessor.cpp:916:11 #5 0x555559aa5365 in __invoke<void (clang::Preprocessor::*&)(clang::Token &), clang::Preprocessor *, clang::Token &> /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_install_msan_track_origins/include/c++/v1/__type_traits/invoke.h:90:27 #6 0x555559aa5365 in invoke<void (clang::Preprocessor::*&)(clang::Token &), clang::Preprocessor *, clang::Token &> /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_install_msan_track_origins/include/c++/v1/__functional/invoke.h:29:10 #7 0x555559aa5365 in operator()<void (clang::Preprocessor::*)(clang::Token &)> /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/lib/Lex/PPDirectives.cpp:470:5 #8 0x555559aa5365 in clang::Preprocessor::CheckEndOfDirective(llvm::StringRef, bool, llvm::SmallVectorImpl<clang::Token>*) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/lib/Lex/PPDirectives.cpp:478:5 qualcomm#9 0x555559ab96b5 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/lib/Lex/PPDirectives.cpp:2205:7 ... ``` (cherry picked from commit f1889a7)

llvm-upstream-sync bot added the automerge_conflict A merge conflict found by the automerge script label Apr 8, 2026

Merge branch 'qualcomm-software' into automerge

edee19f

Signed-off-by: sriyalamar <syalamar@qti.qualcomm.com>

sriyalamar merged commit e1b528c into qualcomm-software Apr 9, 2026
2 checks passed

sriyalamar deleted the automerge branch April 9, 2026 04:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automerge conflict: [SROA] Refactor rewritePartition alloca type selection process (#167771)#3

Automerge conflict: [SROA] Refactor rewritePartition alloca type selection process (#167771)#3
sriyalamar merged 2 commits intoqualcomm-softwarefrom
automerge

llvm-upstream-sync bot commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

llvm-upstream-sync bot commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants