Skip to content

Bulk Promotion from 2025.11.24#929

Merged
ronlieb merged 6333 commits intoamd-mainlinefrom
amd/dev/dsalinas/land_bulk_promo_2025_11_24
Dec 24, 2025
Merged

Bulk Promotion from 2025.11.24#929
ronlieb merged 6333 commits intoamd-mainlinefrom
amd/dev/dsalinas/land_bulk_promo_2025_11_24

Conversation

@david-salinas
Copy link

Bulk Promotion from amd-staging commit: 669e22f

wangzpgi and others added 30 commits November 20, 2025 13:01
…lvm#168937)

This patch extracts the common logic for computing array element counts
from shape operands into a reusable helper function in CUFCommon.
These horizontal add/sub instructions are currently handled by
adding/subtracting tuples of the first operand, followed by tuples of
the second operand. This is not the correct semantics for the 256-bit
insructions: they process the first half of the first operand, then the
first half of the second operand, then the second half of the first
operand, and finally the second half of the second operand (trust me bro
[*]).

This patch fixes the issue by applying the "shards" functionality that
was added in llvm#167954, to handle
the top and bottom 128-bit "shards" in turn.

[*] clang/test/CodeGen/X86/avx2-builtins.c:
```
TEST_CONSTEXPR(match_v8si(_mm256_hadd_epi32(
    (__m256i)(__v8si){10, 20, 30, 40, 50, 60, 70, 80},
    (__m256i)(__v8si){5, 15, 25, 35, 45, 55, 65, 75}),
    30,70,20,60,110,150,100,140));
```
Add a low trip count test that is currently vectorized but unprofitable,
for llvm#167858.
…lvm#168609)

which reassigns scale operand in vgpr_32 register to agpr_32, not
permitted by instruction format. Reduced from ck.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: theRonShark <ron.lieberman@amd.com>
Don't specifically target windows-msvc - the same goes for any windows
target; mingw doesn't have dlfcn.h either.
…#168900)

When comparing additions with the same base where one has `nsw`, the
following simplification can be performed:

```llvm
icmp slt/sgt/sle/sge (x + C1), (x +nsw C2)
=>
icmp slt/sgt/sle/sge C1, C2
```

Previously this was only done for `slt`. This patch extends it to the
`sgt`, `sle`, and `sge` predicates when either of the conditions hold:
- `C1 <= C2 && C1 >= 0`, or
- `C2 <= C1 && C1 <= 0`

This patch also handles the `C1 == C2` case, which was previously
excluded.

Proof: https://alive2.llvm.org/ce/z/LtmY4f
Remove all constraint propagation functions in Dependence Analysis.
Add dependency on headers with `in_addr` and `in_addr_t` type
definitions to ensure that these headers will be properly installed by
"install-libc" CMake target.
… a given tiled loop nest. (llvm#167634)

The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices
(and loops they are part of), tries to find the consumer of these slices
(all slices are expected to be the same consumer), and then tiles the
consumer into the loop nest using the `TilingInterface`. A more natural
way of doing consumer fusion is to just start from the consumer, look
for operands that are produced by the loop nest passed in as `loops`
(presumably these loops are generated by tiling, but that is not a
requirement for consumer fusion). Using the consumer you can find the
slices of the operands that are accessed within the loop which you can
then use to tile and fuse the consumer (using `TilingInterface`). This
handles more naturally the case where multiple operands of the consumer
come from the loop nest.

The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of
`scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a
single producer for the source of the slice, which makes it a natural
way of specifying producer fusion. But for consumers, the result might
have multiple users, resulting in multiple candidates for fusion, as
well as a fusion candidate using multiple results from the tiled loop
nest. This means using slices
(`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for
consumer fusion turns out to be quite hard to navigate. The use of the
consumer directly avoids all those pain points. In time the
`scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of
`scf::tileAndFuseConsumer`. There is a lot of tech-debt that has
accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be
cleanedup. So while that gets cleaned up, and required functionality is
moved to `scf::tileAndFuseConsumer`, the old path is still maintained.

The test for `scf::tileAndFuseConsumerUsingSlices` is copied to
`tile-and-fuse-consumer.mlir` to
`tile-and-fuse-consumer-using-slices.mlir`. All the tests that were
there in this file are now using the `tileAndFuseConsumer` method. The
test op `test.tile_and_fuse_consumer` is modified to call
`scf::tileAndFuseConsumer`, while a new op
`test.tile_and_fuse_consumer_of_slice` is used to keep the old path
tested while it is deprecated.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Add declaration of command line options to BugDriver.h and remove extern
declarations in individual .cpp files.
llvm#156577)

Add detailed comments explaining each function's memory access patterns
and why they should/shouldn't be unroll-and-jammed:

- fore_aft_*: Dependencies between fore block and aft block
- fore_sub_*: Dependencies between fore block and sub block
- sub_aft_*: Dependencies between sub block and aft block
- sub_sub_*: Dependencies within sub block

- *_less: Backward dependency (i-1) - safe for fore/aft, fore/sub,
sub/aft; unsafe for sub/sub due to jamming conflicts
- *_eq: Same iteration dependency (i+0) - safe due to preserved
execution order
- *_more: Forward dependency (i+1) - unsafe due to write-after-write
races between unrolled iterations, except sub/sub case creates conflicts
…168962)

The only thing the docs should depend on is on the SWIG wrapper
(lldb.py) which only requires parsing the API headers. It should not
depend on building libLLDB.

The dependency was (I believe accidentally) introduced by 59f4267.

Fixes llvm#123316
…vm#168930)

Removes about 200 bytes of unneeded patterns from RISCVGenDAGISel.inc
…llvm#168932)

This removes an unnecessary isel pattern for the RV32 HwMode.
…llvm#168957)

On startup, bazel prints: `WARNING: Option
'experimental_guard_against_concurrent_changes' is deprecated: Use
--guard_against_concurrent_changes instead`
This set of patches removes the early tagging of Generic-SPMD target
regions from MLIR to instead only tell apart Generic from SPMD. This
matches the behavior of Clang, which then relies on the OpenMPOpt pass
to detect situations where Generic kernels can be executed in SPMD mode,
potentially after certain transformations.

Merging this PR results in split distribute + parallel do kernels
running in Generic mode, which might cause performance regressions in
these cases. This is because the OpenMPOpt pass is currently not
prepared to properly SPMDize Generic kernels containing new DeviceRTL
loop functions that only Flang currently generates.

Generic mode before these changes is broken when parallel regions are
reached. With this, it should be possible to properly execute them.
…vm#168426)

We previously got a duplicate implicit $exec operand. It didn't really
hurt anything (other than being a slight drag on compile-time
performance). Still, let's keep things clean.
This patch fixes and eliminates the possibility of SupportFileSP ever
being nullptr. The support file was originally treated like a value
type, but became a polymorphic type and therefore has to be stored and
passed around as a pointer.

To avoid having all the callers check the validity of the pointer, I
introduced the invariant that SupportFileSP is never null and always
default constructed. However, without enforcement at the type level,
that's fragile and indeed, we already identified two crashes where
someone accidentally broke that invariant.

This PR introduces a NonNullSharedPtr to prevent that. NonNullSharedPtr
is a smart pointer wrapper around std::shared_ptr that guarantees the
pointer is never null. If default-constructed, it creates a
default-constructed instance of the contained type. Note that I'm using
private inheritance because you shouldn't inherit from standard library
classes due to the lack of virtual destructor. So while the new
abstraction looks like a `std::shared_ptr`, it is in fact **not** a
shared pointer. Given that our destructor is trivial, we could use
public inheritance, but currently there's no need for it.

rdar://164989579
… non-static. NFC. (llvm#168839)

So that we can reuse these functions in few place, such as in
clang/lib/Driver/ToolChains/CommonArgs.cpp. Part of the code there is
currently copied from getOptimizationLevel.
…ert header-only macros (llvm#168016)

Adds the remaining optional feature macros from the OpenCL C 3.0 spec
(section 6.2.1 table). Targets can now enable these via
OpenCLFeaturesMap returned by getSupportedOpenCLOpts().

Revert a84599f (header‑only feature macros).
Header‑only macros are difficult to disable on SPIR-V targets,
and the prior undef approach (a60b8f4) does not scale.
After this PR, they can be disabled via `-cl-ext=-<feature>`.

KhronosGroup/OpenCL-Docs#1328 also notes that
unconditional definition of the header‑only macros in opencl-c-base.h
should be removed.
…67744)

Query RuntimeLibcalls for the support and the name. The check
that the implementation is exactly __guard_local instead of
unsupported feels a bit strange.
boomanaiden154 and others added 22 commits November 23, 2025 22:17
llvm#167060)" (llvm#169238)

This reverts commit a52e1af.

That commit reverted a change (making isExpandedFromMacro take a
std::string) that was explicitly added to avoid lifetime issues. We ran
into issues with some internal matchers due to this, and it probably is
not an uncommon downstream use case. This patch restroes the original
functionality and adds a test to ensure that the functionality is
preserved.

https://reviews.llvm.org/D90303 contains more discussion.
…vm#164768)

Background: X86 APX feature adds 16 registers within the same 64-bit
mode. PR llvm#164638 is trying to extend such registers for FASTCC. However,
a blocker issue is calling convention cannot be changeable with or
without a feature.

The solution is to disable FASTCC if APX is not ready. This is an NFC
change to the final code generation, becasue X86 doesn't define an
alternative ABI for FASTCC in 64-bit mode. We can solve the potential
compatibility issue of llvm#164638 with this patch.
…169260)

This supposes to fix LLVM Buildbot failures after llvm#164768. I don't have
the environment to verify though.
…lvm#169262)

Interfaces can be optional: whether an op implements an interface or not
can depend on the state of the operation.

```
// An optional code block for adding additional "classof" logic. This can
// be used to better enable "optional" interfaces, where an entity only
// implements the interface if some dynamic characteristic holds.
// `$_attr`/`$_op`/`$_type` may be used to refer to an instance of the
// interface instance being checked.
code extraClassOf = "";
```

The current `Pass::canScheduleOn(RegisteredOperationName)` is
insufficient. This commit adds an additional overload to inspect
`Operation *`.

This commit fixes a crash when scheduling an `InterfacePass` for an
optional interface on an operation that does not actually implement the
interface.

This is a re-upload of llvm#168499, which was reverted.
SPIR/SPIR-V are generic targets. Assume they support __bf16.
…llvm#169227)

Merge the SkipBits!=0 handling into the first iteration of the word
loop. This is the same code structure used by BitVector::find_first_in.
…lvm#169256)

I've tested this locally, and the builtins build proceeds without a
hitch for m68k-none-none. This is part of a larger effort to establish a
working m68k baremetal toolchain.
Fixes a bug in `AMDGPUISelLowering` where alias analysis info is not
propagated to split loads and stores.

This is required for llvm#161375

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
The motivation for this is that it would be useful to express a
vslideup/vslidedown in a target independent way e.g. from the loop
vectorizer.

We can do this today with @llvm.vector.splice by setting one operand to
poison:

- A slide down can be achieved with @llvm.vector.splice(%x, poison,
slideamt)
- A slide up can be done by @llvm.vector.splice(poison, %x, -slideamt)

E.g.:

    splice(<a,b,c,d>, poison, 3) = <d,poison,poison,poison>
    splice(poison, <a,b,c,d>, -3) = <poison,poison,poison,a>

These splices get lowered to a vslideup + vslidedown pair with one of
the vs2s being poison. We can optimize this away so that we are just
left with a single slideup/slidedown.
…er (llvm#168836)

Two years ago, `operand_segment_sizes` and `result_segment_sizes` were
renamed to `operandSegmentSizes` and `resultSegmentSizes` (check related
commits, e.g.
llvm@363b655).

However, the op verifiers in IRDL loading phase is still using old
attributes like `operand_segment_sizes` and `result_segment_sizes`,
which causes some conflict, e.g. it is not compatible with the OpView
builder in MLIR python bindings (which generates camelCase segment
attributes).

This PR is to support to use camelCase segment size attributes in IRDL
verifier. Note that support of `operand_segment_sizes` and
`result_segment_sizes` is dropped.

I found this issue since I'm working on a new IRDL wrapper in the MLIR
python bindings.
Merge commit '669e22f6553c5f9bca2d40a34cbfde9a770033f8' into HEAD
 Merge remote-tracking branch 'external-mirror/promotion/amd-mainline/2025.11.24' into HEAD
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@z1-cciauto
Copy link
Collaborator

@ronlieb ronlieb merged commit 08a72fc into amd-mainline Dec 24, 2025
49 of 56 checks passed
@ronlieb ronlieb deleted the amd/dev/dsalinas/land_bulk_promo_2025_11_24 branch December 24, 2025 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.