Skip to content

Automerge conflict: [SROA] Refactor rewritePartition alloca type selection process (#167771)#3

Merged
sriyalamar merged 2 commits intoqualcomm-softwarefrom
automerge
Apr 9, 2026
Merged

Automerge conflict: [SROA] Refactor rewritePartition alloca type selection process (#167771)#3
sriyalamar merged 2 commits intoqualcomm-softwarefrom
automerge

Conversation

@llvm-upstream-sync
Copy link
Copy Markdown

This PR does two things:

  1. Refactor the rewritePartition type selection process to move the
    logic inside of a lambda. Previously the selection process would make
    use of a mutable SliceTy. Each phase of the type selection would do
    something like if (!SliceTy) { // try to set sliceTy }. But you also
    have if (!SliceTy && <condition>) and if (!SliceTy || <condition>).
    I think this style makes the priority mechanism confusing. The new way I
    wrote the selection process is equivalent (except for the second
    contribution of this PR), and works by checking a condition, followed by
    returning the selected type right away. I think it makes the priority
    clearer.

  2. What motivated the rewrite is that there are some cases with small
    aggregate allocas that have mixed type loads and stores that SROA fails
    to promote.

For example, given:

%alloca = alloca [2 x half]
store i32 42, ptr %alloca
%val = load float, ptr %alloca

Previously, SROA would:

  • Find no common type between i32 and float
  • Use getTypePartition which returns [2 x half]
  • Create a new alloca [2 x half]

This PR adds an additional check:

if (LargestIntTy &&
    DL.getTypeAllocSize(LargestIntTy).getFixedValue() >= P.size() &&
    isIntegerWideningViable(P, LargestIntTy, DL))
  return {LargestIntTy, true, nullptr};

which allows the alloca above to get promoted as i32.

The larger rewrite helps with this because this check is super specific.
I only want it to apply when there is no common type and when all the
loads and stores can be widened for promotion. I also want to apply it
on the subtype returned from getTypePartition. The larger rewrite makes
it very clear when this optimization occurs. I also don't want it to
apply when vector promotion is possible.

After this change, the example is optimized to:

%0 = bitcast i32 42 to float
ret void

The alloca is completely eliminated.

This PR does two things:
1. Refactor the rewritePartition type selection process to move the
logic inside of a lambda. Previously the selection process would make
use of a mutable `SliceTy`. Each phase of the type selection would do
something like `if (!SliceTy) { // try to set sliceTy` }. But you also
have `if (!SliceTy && <condition>)` and `if (!SliceTy || <condition>)`.
I think this style makes the priority mechanism confusing. The new way I
wrote the selection process is equivalent (except for the second
contribution of this PR), and works by checking a condition, followed by
returning the selected type right away. I think it makes the priority
clearer.

2. What motivated the rewrite is that there are some cases with small
aggregate allocas that have mixed type loads and stores that SROA fails
to promote.

For example, given:
```
%alloca = alloca [2 x half]
store i32 42, ptr %alloca
%val = load float, ptr %alloca
```
Previously, SROA would:
- Find no common type between `i32` and `float`
- Use `getTypePartition` which returns `[2 x half]`
- Create a new `alloca [2 x half]`

This PR adds an additional check:

```
if (LargestIntTy &&
    DL.getTypeAllocSize(LargestIntTy).getFixedValue() >= P.size() &&
    isIntegerWideningViable(P, LargestIntTy, DL))
  return {LargestIntTy, true, nullptr};
```

which allows the alloca above to get promoted as `i32`.

The larger rewrite helps with this because this check is super specific.
I only want it to apply when there is no common type and when all the
loads and stores can be widened for promotion. I also want to apply it
on the subtype returned from getTypePartition. The larger rewrite makes
it very clear when this optimization occurs. I also don't want it to
apply when vector promotion is possible.

After this change, the example is optimized to:
```
%0 = bitcast i32 42 to float
ret void
```
The alloca is completely eliminated.
@llvm-upstream-sync llvm-upstream-sync bot added the automerge_conflict A merge conflict found by the automerge script label Apr 8, 2026
llvm-upstream-sync bot pushed a commit that referenced this pull request Apr 9, 2026
Running gcc test c-c++-common/tsan/tls_race.c on s390 we get:

ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:618 "((thr_beg))
>= ((tls_addr))" (0x3ffaa35e140, 0x3ffaa35e250) (tid=2419930)
#0 __tsan::CheckUnwind() /devel/src/libsanitizer/tsan/tsan_rtl.cpp:696
(libtsan.so.2+0x91b57)
#1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long
long, unsigned long long)
/devel/src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:86
(libtsan.so.2+0xd211b)
#2 __tsan::ImitateTlsWrite(__tsan::ThreadState*, unsigned long, unsigned
long) /devel/src/libsanitizer/tsan/tsan_platform_linux.cpp:618
(libtsan.so.2+0x8faa3)
#3 __tsan::ThreadStart(__tsan::ThreadState*, unsigned int, unsigned long
long, __sanitizer::ThreadType)
/devel/src/libsanitizer/tsan/tsan_rtl_thread.cpp:225
(libtsan.so.2+0xaadb5)
#4 __tsan_thread_start_func
/devel/src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1065
(libtsan.so.2+0x3d34d)
#5 start_thread <null> (libc.so.6+0xae70d) (BuildId:
d3b08de1b543c2d15d419bf861b3c2e4c01ac75b)
#6 thread_start <null> (libc.so.6+0x12d2ff) (BuildId:
d3b08de1b543c2d15d419bf861b3c2e4c01ac75b)

In order to determine the static TLS blocks in GetStaticTlsBoundary we
iterate over the modules and try to find the largest range without a
gap. Here we might have that modules are spaced exactly by the
alignment. For example, for the failing test we have:

(gdb) p/x ranges.data_[0]
$1 = {begin = 0x3fff7f9e6b8, end = 0x3fff7f9e740, align = 0x8, tls_modid
= 0x3} (gdb) p/x ranges.data_[1]
$2 = {begin = 0x3fff7f9e740, end = 0x3fff7f9eed0, align = 0x40,
tls_modid = 0x2} (gdb) p/x ranges.data_[2]
$3 = {begin = 0x3fff7f9eed8, end = 0x3fff7f9eef8, align = 0x8, tls_modid
= 0x4} (gdb) p/x ranges.data_[3]
$4 = {begin = 0x3fff7f9eefc, end = 0x3fff7f9ef00, align = 0x4, tls_modid
= 0x1}

where ranges[3].begin == ranges[2].end + ranges[3].align holds. Since in
the loop a strict inequality test is used we compute the wrong address

(gdb) p/x *addr
$5 = 0x3fff7f9eefc

whereas 0x3fff7f9e6b8 is expected which is why we bail out in the
subsequent.
Signed-off-by: sriyalamar <syalamar@qti.qualcomm.com>
@sriyalamar sriyalamar merged commit e1b528c into qualcomm-software Apr 9, 2026
2 checks passed
sriyalamar pushed a commit that referenced this pull request Apr 9, 2026
…e edge case (#188590)

llvm/llvm-project#186966 was reverted because
the test case triggered a use-of-uninitialized-memory
(https://lab.llvm.org/buildbot/#/builders/94/builds/16379), due to the
include directive omitting a trailing newline. This patch adds a minor
fix to avoid the use-of-uninitialized-memory, and deliberately re-adds
the test case sans trailing newline for regression testing.

MSan report prior to this fix:
```
@@@BUILD_STEP sanitizer logs: stage2/msan_track_origins check@@@
==clang-scan-deps==616960==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5555599c3300 in isAnnotation /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/Lex/Token.h:131:38
    #1 0x5555599c3300 in setLength /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/Lex/Token.h:152:13
    #2 0x5555599c3300 in clang::Lexer::FormTokenWithChars(clang::Token&, char const*, clang::tok::TokenKind) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/include/clang/Lex/Lexer.h:644:12
    #3 0x5555599cf895 in clang::Lexer::LexEndOfFile(clang::Token&, char const*) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/lib/Lex/Lexer.cpp:3166:5
    #4 0x555559bb229b in clang::Preprocessor::Lex(clang::Token&) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/lib/Lex/Preprocessor.cpp:916:11
    #5 0x555559aa5365 in __invoke<void (clang::Preprocessor::*&)(clang::Token &), clang::Preprocessor *, clang::Token &> /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_install_msan_track_origins/include/c++/v1/__type_traits/invoke.h:90:27
    #6 0x555559aa5365 in invoke<void (clang::Preprocessor::*&)(clang::Token &), clang::Preprocessor *, clang::Token &> /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_install_msan_track_origins/include/c++/v1/__functional/invoke.h:29:10
    #7 0x555559aa5365 in operator()<void (clang::Preprocessor::*)(clang::Token &)> /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/lib/Lex/PPDirectives.cpp:470:5
    #8 0x555559aa5365 in clang::Preprocessor::CheckEndOfDirective(llvm::StringRef, bool, llvm::SmallVectorImpl<clang::Token>*) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/lib/Lex/PPDirectives.cpp:478:5
    qualcomm#9 0x555559ab96b5 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/clang/lib/Lex/PPDirectives.cpp:2205:7
    ...
```

(cherry picked from commit f1889a7)
@sriyalamar sriyalamar deleted the automerge branch April 9, 2026 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge_conflict A merge conflict found by the automerge script

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants