Bulk Promotion from 2025.11.24 by david-salinas · Pull Request #929 · ROCm/llvm-project

david-salinas · 2025-12-23T17:00:02Z

Bulk Promotion from amd-staging commit: 669e22f

…lvm#168937) This patch extracts the common logic for computing array element counts from shape operands into a reusable helper function in CUFCommon.

These horizontal add/sub instructions are currently handled by adding/subtracting tuples of the first operand, followed by tuples of the second operand. This is not the correct semantics for the 256-bit insructions: they process the first half of the first operand, then the first half of the second operand, then the second half of the first operand, and finally the second half of the second operand (trust me bro [*]). This patch fixes the issue by applying the "shards" functionality that was added in llvm#167954, to handle the top and bottom 128-bit "shards" in turn. [*] clang/test/CodeGen/X86/avx2-builtins.c: ``` TEST_CONSTEXPR(match_v8si(_mm256_hadd_epi32( (__m256i)(__v8si){10, 20, 30, 40, 50, 60, 70, 80}, (__m256i)(__v8si){5, 15, 25, 35, 45, 55, 65, 75}), 30,70,20,60,110,150,100,140)); ```

Add a low trip count test that is currently vectorized but unprofitable, for llvm#167858.

…lvm#168609) which reassigns scale operand in vgpr_32 register to agpr_32, not permitted by instruction format. Reduced from ck. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: theRonShark <ron.lieberman@amd.com>

Don't specifically target windows-msvc - the same goes for any windows target; mingw doesn't have dlfcn.h either.

…#168900) When comparing additions with the same base where one has `nsw`, the following simplification can be performed: ```llvm icmp slt/sgt/sle/sge (x + C1), (x +nsw C2) => icmp slt/sgt/sle/sge C1, C2 ``` Previously this was only done for `slt`. This patch extends it to the `sgt`, `sle`, and `sge` predicates when either of the conditions hold: - `C1 <= C2 && C1 >= 0`, or - `C2 <= C1 && C1 <= 0` This patch also handles the `C1 == C2` case, which was previously excluded. Proof: https://alive2.llvm.org/ce/z/LtmY4f

Remove all constraint propagation functions in Dependence Analysis.

Add dependency on headers with `in_addr` and `in_addr_t` type definitions to ensure that these headers will be properly installed by "install-libc" CMake target.

… a given tiled loop nest. (llvm#167634) The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices (and loops they are part of), tries to find the consumer of these slices (all slices are expected to be the same consumer), and then tiles the consumer into the loop nest using the `TilingInterface`. A more natural way of doing consumer fusion is to just start from the consumer, look for operands that are produced by the loop nest passed in as `loops` (presumably these loops are generated by tiling, but that is not a requirement for consumer fusion). Using the consumer you can find the slices of the operands that are accessed within the loop which you can then use to tile and fuse the consumer (using `TilingInterface`). This handles more naturally the case where multiple operands of the consumer come from the loop nest. The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of `scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a single producer for the source of the slice, which makes it a natural way of specifying producer fusion. But for consumers, the result might have multiple users, resulting in multiple candidates for fusion, as well as a fusion candidate using multiple results from the tiled loop nest. This means using slices (`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for consumer fusion turns out to be quite hard to navigate. The use of the consumer directly avoids all those pain points. In time the `scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of `scf::tileAndFuseConsumer`. There is a lot of tech-debt that has accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be cleanedup. So while that gets cleaned up, and required functionality is moved to `scf::tileAndFuseConsumer`, the old path is still maintained. The test for `scf::tileAndFuseConsumerUsingSlices` is copied to `tile-and-fuse-consumer.mlir` to `tile-and-fuse-consumer-using-slices.mlir`. All the tests that were there in this file are now using the `tileAndFuseConsumer` method. The test op `test.tile_and_fuse_consumer` is modified to call `scf::tileAndFuseConsumer`, while a new op `test.tile_and_fuse_consumer_of_slice` is used to keep the old path tested while it is deprecated. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>

Add declaration of command line options to BugDriver.h and remove extern declarations in individual .cpp files.

Reverts llvm#168921 Causes build failures.

llvm#156577) Add detailed comments explaining each function's memory access patterns and why they should/shouldn't be unroll-and-jammed: - fore_aft_*: Dependencies between fore block and aft block - fore_sub_*: Dependencies between fore block and sub block - sub_aft_*: Dependencies between sub block and aft block - sub_sub_*: Dependencies within sub block - *_less: Backward dependency (i-1) - safe for fore/aft, fore/sub, sub/aft; unsafe for sub/sub due to jamming conflicts - *_eq: Same iteration dependency (i+0) - safe due to preserved execution order - *_more: Forward dependency (i+1) - unsafe due to write-after-write races between unrolled iterations, except sub/sub case creates conflicts

…168962) The only thing the docs should depend on is on the SWIG wrapper (lldb.py) which only requires parsing the API headers. It should not depend on building libLLDB. The dependency was (I believe accidentally) introduced by 59f4267. Fixes llvm#123316

…vm#168930) Removes about 200 bytes of unneeded patterns from RISCVGenDAGISel.inc

… Wg To Sg (llvm#168118)

…llvm#168932) This removes an unnecessary isel pattern for the RV32 HwMode.

…llvm#168957) On startup, bazel prints: `WARNING: Option 'experimental_guard_against_concurrent_changes' is deprecated: Use --guard_against_concurrent_changes instead`

This set of patches removes the early tagging of Generic-SPMD target regions from MLIR to instead only tell apart Generic from SPMD. This matches the behavior of Clang, which then relies on the OpenMPOpt pass to detect situations where Generic kernels can be executed in SPMD mode, potentially after certain transformations. Merging this PR results in split distribute + parallel do kernels running in Generic mode, which might cause performance regressions in these cases. This is because the OpenMPOpt pass is currently not prepared to properly SPMDize Generic kernels containing new DeviceRTL loop functions that only Flang currently generates. Generic mode before these changes is broken when parallel regions are reached. With this, it should be possible to properly execute them.

…lvm#168973) Reverts llvm#168643

…vm#168426) We previously got a duplicate implicit $exec operand. It didn't really hurt anything (other than being a slight drag on compile-time performance). Still, let's keep things clean.

This patch fixes and eliminates the possibility of SupportFileSP ever being nullptr. The support file was originally treated like a value type, but became a polymorphic type and therefore has to be stored and passed around as a pointer. To avoid having all the callers check the validity of the pointer, I introduced the invariant that SupportFileSP is never null and always default constructed. However, without enforcement at the type level, that's fragile and indeed, we already identified two crashes where someone accidentally broke that invariant. This PR introduces a NonNullSharedPtr to prevent that. NonNullSharedPtr is a smart pointer wrapper around std::shared_ptr that guarantees the pointer is never null. If default-constructed, it creates a default-constructed instance of the contained type. Note that I'm using private inheritance because you shouldn't inherit from standard library classes due to the lack of virtual destructor. So while the new abstraction looks like a `std::shared_ptr`, it is in fact **not** a shared pointer. Given that our destructor is trivial, we could use public inheritance, but currently there's no need for it. rdar://164989579

…llvm#168975)

… non-static. NFC. (llvm#168839) So that we can reuse these functions in few place, such as in clang/lib/Driver/ToolChains/CommonArgs.cpp. Part of the code there is currently copied from getOptimizationLevel.

…ert header-only macros (llvm#168016) Adds the remaining optional feature macros from the OpenCL C 3.0 spec (section 6.2.1 table). Targets can now enable these via OpenCLFeaturesMap returned by getSupportedOpenCLOpts(). Revert a84599f (header‑only feature macros). Header‑only macros are difficult to disable on SPIR-V targets, and the prior undef approach (a60b8f4) does not scale. After this PR, they can be disabled via `-cl-ext=-<feature>`. KhronosGroup/OpenCL-Docs#1328 also notes that unconditional definition of the header‑only macros in opencl-c-base.h should be removed.

…8925)

…67744) Query RuntimeLibcalls for the support and the name. The check that the implementation is exactly __guard_local instead of unsupported feels a bit strange.

llvm#167060)" (llvm#169238) This reverts commit a52e1af. That commit reverted a change (making isExpandedFromMacro take a std::string) that was explicitly added to avoid lifetime issues. We ran into issues with some internal matchers due to this, and it probably is not an uncommon downstream use case. This patch restroes the original functionality and adds a test to ensure that the functionality is preserved. https://reviews.llvm.org/D90303 contains more discussion.

…lvm#169255)

…vm#164768) Background: X86 APX feature adds 16 registers within the same 64-bit mode. PR llvm#164638 is trying to extend such registers for FASTCC. However, a blocker issue is calling convention cannot be changeable with or without a feature. The solution is to disable FASTCC if APX is not ready. This is an NFC change to the final code generation, becasue X86 doesn't define an alternative ABI for FASTCC in 64-bit mode. We can solve the potential compatibility issue of llvm#164638 with this patch.

…Reg (llvm#168661)" (llvm#169219) Reland d5f3ab8, fix testcases.

…169260) This supposes to fix LLVM Buildbot failures after llvm#164768. I don't have the environment to verify though.

…lvm#169262) Interfaces can be optional: whether an op implements an interface or not can depend on the state of the operation. ``` // An optional code block for adding additional "classof" logic. This can // be used to better enable "optional" interfaces, where an entity only // implements the interface if some dynamic characteristic holds. // `$_attr`/`$_op`/`$_type` may be used to refer to an instance of the // interface instance being checked. code extraClassOf = ""; ``` The current `Pass::canScheduleOn(RegisteredOperationName)` is insufficient. This commit adds an additional overload to inspect `Operation *`. This commit fixes a crash when scheduling an `InterfacePass` for an optional interface on an operation that does not actually implement the interface. This is a re-upload of llvm#168499, which was reverted.

SPIR/SPIR-V are generic targets. Assume they support __bf16.

…llvm#169227) Merge the SkipBits!=0 handling into the first iteration of the word loop. This is the same code structure used by BitVector::find_first_in.

… try (llvm#169266)

…lvm#169256) I've tested this locally, and the builtins build proceeds without a hitch for m68k-none-none. This is part of a larger effort to establish a working m68k baremetal toolchain.

Fixes a bug in `AMDGPUISelLowering` where alias analysis info is not propagated to split loads and stores. This is required for llvm#161375 --------- Co-authored-by: Leon Clark <leoclark@amd.com>

The motivation for this is that it would be useful to express a vslideup/vslidedown in a target independent way e.g. from the loop vectorizer. We can do this today with @llvm.vector.splice by setting one operand to poison: - A slide down can be achieved with @llvm.vector.splice(%x, poison, slideamt) - A slide up can be done by @llvm.vector.splice(poison, %x, -slideamt) E.g.: splice(<a,b,c,d>, poison, 3) = <d,poison,poison,poison> splice(poison, <a,b,c,d>, -3) = <poison,poison,poison,a> These splices get lowered to a vslideup + vslidedown pair with one of the vs2s being poison. We can optimize this away so that we are just left with a single slideup/slidedown.

…er (llvm#168836) Two years ago, `operand_segment_sizes` and `result_segment_sizes` were renamed to `operandSegmentSizes` and `resultSegmentSizes` (check related commits, e.g. llvm@363b655). However, the op verifiers in IRDL loading phase is still using old attributes like `operand_segment_sizes` and `result_segment_sizes`, which causes some conflict, e.g. it is not compatible with the OpView builder in MLIR python bindings (which generates camelCase segment attributes). This PR is to support to use camelCase segment size attributes in IRDL verifier. Note that support of `operand_segment_sizes` and `result_segment_sizes` is dropped. I found this issue since I'm working on a new IRDL wrapper in the MLIR python bindings.

…hain (llvm#168135)" breaks build of CK This reverts commit 9e9fe08.

#662)

Merge commit '669e22f6553c5f9bca2d40a34cbfde9a770033f8' into HEAD

Merge remote-tracking branch 'external-mirror/promotion/amd-mainline/2025.11.24' into HEAD

github-actions · 2025-12-23T17:00:41Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

z1-cciauto · 2025-12-23T17:01:40Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-mainline/525

wangzpgi and others added 30 commits November 20, 2025 13:01

[flang][cuda] Extract element count computation into helper function (l…

1b8a4aa

…lvm#168937) This patch extracts the common logic for computing array element counts from shape operands into a reusable helper function in CUFCommon.

[LV] Add test a low-trip count test without folding the tail.

a3f6c43

Add a low trip count test that is currently vectorized but unprofitable, for llvm#167858.

[compiler-rt] [test] Generalize an UNSUPPORTED marking (llvm#168858)

04acac2

Don't specifically target windows-msvc - the same goes for any windows target; mingw doesn't have dlfcn.h either.

[DA] remove constraint propagation (llvm#160924)

5c8db7a

Remove all constraint propagation functions in Dependence Analysis.

[libc] Add missing dependencies for arpa/inet.h header. (llvm#168951)

1136239

Add dependency on headers with `in_addr` and `in_addr_t` type definitions to ensure that these headers will be properly installed by "install-libc" CMake target.

[mlir] Add kuhar to code owners for arith (llvm#168945)

9e2ca0d

[NFC][bugpoint] Namespace cleanup in bugpoint (llvm#168921)

bf91a62

Add declaration of command line options to BugDriver.h and remove extern declarations in individual .cpp files.

Revert "[NFC][bugpoint] Namespace cleanup in bugpoint" (llvm#168961)

b83e458

Reverts llvm#168921 Causes build failures.

[RISCV] Only add v2i32 to GPR regclass in the RV64 hardware mode. (ll…

fbc0935

…vm#168930) Removes about 200 bytes of unneeded patterns from RISCVGenDAGISel.inc

[MLIR] [XeGPU] Add distribution pattern for vector.constant_mask from…

310abe0

… Wg To Sg (llvm#168118)

[RISCV] Use SDT_RISCVIntUnaryOpW for RISCVISD::ABSW type profile. NFC (…

a9435cb

…llvm#168932) This removes an unnecessary isel pattern for the RV32 HwMode.

[clang][deps] NFC: Fix typo in function name (llvm#168958)

925ce5a

[bazel] Replace --experimental_guard_against_concurrent_changes usage (…

3723a8b

…llvm#168957) On startup, bazel prints: `WARNING: Option 'experimental_guard_against_concurrent_changes' is deprecated: Use --guard_against_concurrent_changes instead`

[UBSan] [compiler-rt] add preservecc variants of handlers (llvm#168643)

49e46a5

Revert "[UBSan] [compiler-rt] add preservecc variants of handlers" (l…

418204d

…lvm#168973) Reverts llvm#168643

AMDGPU: Don't duplicate implicit operands in 3-address conversion (ll…

ac55d78

…vm#168426) We previously got a duplicate implicit $exec operand. It didn't really hurt anything (other than being a slight drag on compile-time performance). Still, let's keep things clean.

AMDGPU: Convert constant-address-space-32bit test to generated checks (…

3954df9

…llvm#168975)

merge main into amd-staging (#637)

4b7ddf4

[Clang] Refactor getOptimizationLevel and getOptimizationLevelSize to…

8439aeb

… non-static. NFC. (llvm#168839) So that we can reuse these functions in few place, such as in clang/lib/Driver/ToolChains/CommonArgs.cpp. Part of the code there is currently copied from getOptimizationLevel.

[dsymutil] Add missing validation for zero alignment section (llvm#16…

c34f76d

…8925)

TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (llvm#1…

1d73b68

…67744) Query RuntimeLibcalls for the support and the name. The check that the implementation is exactly __guard_local instead of unsupported feels a bit strange.

boomanaiden154 and others added 22 commits November 23, 2025 22:17

[ORC] Fix typo in comment.

ded1311

[gn] port b5812c0 (LoongArch SDNodeInfo)

b73a281

[orc-rt] Remove unused Session argument from WrapperFunction::call. (l…

3c3e2a2

…lvm#169255)

Reland "[RegAlloc] Fix the terminal rule check for interfere with Dst…

a6cec3f

…Reg (llvm#168661)" (llvm#169219) Reland d5f3ab8, fix testcases.

[GlobalOpt] Use target triple to fix Buildbot failures, NFCI (llvm#…

25c2cc4

…169260) This supposes to fix LLVM Buildbot failures after llvm#164768. I don't have the environment to verify though.

[Clang] Support __bf16 type for SPIR/SPIR-V (llvm#169012)

c4254cd

SPIR/SPIR-V are generic targets. Assume they support __bf16.

[TableGen] Simplify MachineValueTypeSet::iterator::find_from_pos. NFC (…

e71f243

…llvm#169227) Merge the SkipBits!=0 handling into the first iteration of the word loop. This is the same code structure used by BitVector::find_first_in.

[GlobalOpt] Use x86-registered-target to fix Buildbot failures, 2nd…

c33e50b

… try (llvm#169266)

[M68k][compiler-rt] Allow compiler-rt builtins to be built for M68k (l…

acab67b

…lvm#169256) I've tested this locally, and the builtins build proceeds without a hitch for m68k-none-none. This is part of a larger effort to establish a working m68k baremetal toolchain.

[AMDGPU] Propagate AA info in vector load/store splitting. (llvm#168871)

ee4f647

Fixes a bug in `AMDGPUISelLowering` where alias analysis info is not propagated to split loads and stores. This is required for llvm#161375 --------- Co-authored-by: Leon Clark <leoclark@amd.com>

Revert "Re-land [Transform][LoadStoreVectorizer] allow redundant in C…

446b06e

…hain (llvm#168135)" breaks build of CK This reverts commit 9e9fe08.

merge main into amd-staging

bbe5f80

Update revert_patches.txt : breaks CK

dd192b6

merge main into amd-staging (#663)

5b91392

Revert "Re-land [Transform][LoadStoreVectorizer] allow redundant in C… (

669e22f

#662)

Bulk Promotion

3eb9e3e

Merge commit '669e22f6553c5f9bca2d40a34cbfde9a770033f8' into HEAD

`Bulk Promotion from 2025.11.24

265f4e5

Merge remote-tracking branch 'external-mirror/promotion/amd-mainline/2025.11.24' into HEAD

david-salinas requested a review from lajagapp December 23, 2025 17:00

david-salinas requested review from SyamaAmd, kzhuravl and lamb-j December 23, 2025 17:00

ronlieb merged commit 08a72fc into amd-mainline Dec 24, 2025
49 of 56 checks passed

ronlieb deleted the amd/dev/dsalinas/land_bulk_promo_2025_11_24 branch December 24, 2025 01:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk Promotion from 2025.11.24#929

Bulk Promotion from 2025.11.24#929
ronlieb merged 6333 commits intoamd-mainlinefrom
amd/dev/dsalinas/land_bulk_promo_2025_11_24

david-salinas commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

z1-cciauto commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

david-salinas commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

z1-cciauto commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants