Bulk Promotion from 2025.11.24#929
Conversation
…lvm#168937) This patch extracts the common logic for computing array element counts from shape operands into a reusable helper function in CUFCommon.
These horizontal add/sub instructions are currently handled by adding/subtracting tuples of the first operand, followed by tuples of the second operand. This is not the correct semantics for the 256-bit insructions: they process the first half of the first operand, then the first half of the second operand, then the second half of the first operand, and finally the second half of the second operand (trust me bro [*]). This patch fixes the issue by applying the "shards" functionality that was added in llvm#167954, to handle the top and bottom 128-bit "shards" in turn. [*] clang/test/CodeGen/X86/avx2-builtins.c: ``` TEST_CONSTEXPR(match_v8si(_mm256_hadd_epi32( (__m256i)(__v8si){10, 20, 30, 40, 50, 60, 70, 80}, (__m256i)(__v8si){5, 15, 25, 35, 45, 55, 65, 75}), 30,70,20,60,110,150,100,140)); ```
Add a low trip count test that is currently vectorized but unprofitable, for llvm#167858.
…lvm#168609) which reassigns scale operand in vgpr_32 register to agpr_32, not permitted by instruction format. Reduced from ck. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: theRonShark <ron.lieberman@amd.com>
Don't specifically target windows-msvc - the same goes for any windows target; mingw doesn't have dlfcn.h either.
…#168900) When comparing additions with the same base where one has `nsw`, the following simplification can be performed: ```llvm icmp slt/sgt/sle/sge (x + C1), (x +nsw C2) => icmp slt/sgt/sle/sge C1, C2 ``` Previously this was only done for `slt`. This patch extends it to the `sgt`, `sle`, and `sge` predicates when either of the conditions hold: - `C1 <= C2 && C1 >= 0`, or - `C2 <= C1 && C1 <= 0` This patch also handles the `C1 == C2` case, which was previously excluded. Proof: https://alive2.llvm.org/ce/z/LtmY4f
Remove all constraint propagation functions in Dependence Analysis.
Add dependency on headers with `in_addr` and `in_addr_t` type definitions to ensure that these headers will be properly installed by "install-libc" CMake target.
… a given tiled loop nest. (llvm#167634) The existing `scf::tileAndFuseConsumerOfSlices` takes a list of slices (and loops they are part of), tries to find the consumer of these slices (all slices are expected to be the same consumer), and then tiles the consumer into the loop nest using the `TilingInterface`. A more natural way of doing consumer fusion is to just start from the consumer, look for operands that are produced by the loop nest passed in as `loops` (presumably these loops are generated by tiling, but that is not a requirement for consumer fusion). Using the consumer you can find the slices of the operands that are accessed within the loop which you can then use to tile and fuse the consumer (using `TilingInterface`). This handles more naturally the case where multiple operands of the consumer come from the loop nest. The `scf::tileAndFuseConsumerOfSlices` was implemented as a mirror of `scf::tileAndFuseProducerOfSlice`. For the latter, the slice has a single producer for the source of the slice, which makes it a natural way of specifying producer fusion. But for consumers, the result might have multiple users, resulting in multiple candidates for fusion, as well as a fusion candidate using multiple results from the tiled loop nest. This means using slices (`tensor.insert_slice`/`tensor.parallel_insert_slice`) as a hook for consumer fusion turns out to be quite hard to navigate. The use of the consumer directly avoids all those pain points. In time the `scf::tileAndFuseConsumerOfSlices` should be deprecated in favor of `scf::tileAndFuseConsumer`. There is a lot of tech-debt that has accumulated in `scf::tileAndFuseConsumerOfSlices` that needs to be cleanedup. So while that gets cleaned up, and required functionality is moved to `scf::tileAndFuseConsumer`, the old path is still maintained. The test for `scf::tileAndFuseConsumerUsingSlices` is copied to `tile-and-fuse-consumer.mlir` to `tile-and-fuse-consumer-using-slices.mlir`. All the tests that were there in this file are now using the `tileAndFuseConsumer` method. The test op `test.tile_and_fuse_consumer` is modified to call `scf::tileAndFuseConsumer`, while a new op `test.tile_and_fuse_consumer_of_slice` is used to keep the old path tested while it is deprecated. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Add declaration of command line options to BugDriver.h and remove extern declarations in individual .cpp files.
Reverts llvm#168921 Causes build failures.
llvm#156577) Add detailed comments explaining each function's memory access patterns and why they should/shouldn't be unroll-and-jammed: - fore_aft_*: Dependencies between fore block and aft block - fore_sub_*: Dependencies between fore block and sub block - sub_aft_*: Dependencies between sub block and aft block - sub_sub_*: Dependencies within sub block - *_less: Backward dependency (i-1) - safe for fore/aft, fore/sub, sub/aft; unsafe for sub/sub due to jamming conflicts - *_eq: Same iteration dependency (i+0) - safe due to preserved execution order - *_more: Forward dependency (i+1) - unsafe due to write-after-write races between unrolled iterations, except sub/sub case creates conflicts
…168962) The only thing the docs should depend on is on the SWIG wrapper (lldb.py) which only requires parsing the API headers. It should not depend on building libLLDB. The dependency was (I believe accidentally) introduced by 59f4267. Fixes llvm#123316
…vm#168930) Removes about 200 bytes of unneeded patterns from RISCVGenDAGISel.inc
…llvm#168932) This removes an unnecessary isel pattern for the RV32 HwMode.
…llvm#168957) On startup, bazel prints: `WARNING: Option 'experimental_guard_against_concurrent_changes' is deprecated: Use --guard_against_concurrent_changes instead`
This set of patches removes the early tagging of Generic-SPMD target regions from MLIR to instead only tell apart Generic from SPMD. This matches the behavior of Clang, which then relies on the OpenMPOpt pass to detect situations where Generic kernels can be executed in SPMD mode, potentially after certain transformations. Merging this PR results in split distribute + parallel do kernels running in Generic mode, which might cause performance regressions in these cases. This is because the OpenMPOpt pass is currently not prepared to properly SPMDize Generic kernels containing new DeviceRTL loop functions that only Flang currently generates. Generic mode before these changes is broken when parallel regions are reached. With this, it should be possible to properly execute them.
…vm#168426) We previously got a duplicate implicit $exec operand. It didn't really hurt anything (other than being a slight drag on compile-time performance). Still, let's keep things clean.
This patch fixes and eliminates the possibility of SupportFileSP ever being nullptr. The support file was originally treated like a value type, but became a polymorphic type and therefore has to be stored and passed around as a pointer. To avoid having all the callers check the validity of the pointer, I introduced the invariant that SupportFileSP is never null and always default constructed. However, without enforcement at the type level, that's fragile and indeed, we already identified two crashes where someone accidentally broke that invariant. This PR introduces a NonNullSharedPtr to prevent that. NonNullSharedPtr is a smart pointer wrapper around std::shared_ptr that guarantees the pointer is never null. If default-constructed, it creates a default-constructed instance of the contained type. Note that I'm using private inheritance because you shouldn't inherit from standard library classes due to the lack of virtual destructor. So while the new abstraction looks like a `std::shared_ptr`, it is in fact **not** a shared pointer. Given that our destructor is trivial, we could use public inheritance, but currently there's no need for it. rdar://164989579
… non-static. NFC. (llvm#168839) So that we can reuse these functions in few place, such as in clang/lib/Driver/ToolChains/CommonArgs.cpp. Part of the code there is currently copied from getOptimizationLevel.
…ert header-only macros (llvm#168016) Adds the remaining optional feature macros from the OpenCL C 3.0 spec (section 6.2.1 table). Targets can now enable these via OpenCLFeaturesMap returned by getSupportedOpenCLOpts(). Revert a84599f (header‑only feature macros). Header‑only macros are difficult to disable on SPIR-V targets, and the prior undef approach (a60b8f4) does not scale. After this PR, they can be disabled via `-cl-ext=-<feature>`. KhronosGroup/OpenCL-Docs#1328 also notes that unconditional definition of the header‑only macros in opencl-c-base.h should be removed.
…67744) Query RuntimeLibcalls for the support and the name. The check that the implementation is exactly __guard_local instead of unsupported feels a bit strange.
llvm#167060)" (llvm#169238) This reverts commit a52e1af. That commit reverted a change (making isExpandedFromMacro take a std::string) that was explicitly added to avoid lifetime issues. We ran into issues with some internal matchers due to this, and it probably is not an uncommon downstream use case. This patch restroes the original functionality and adds a test to ensure that the functionality is preserved. https://reviews.llvm.org/D90303 contains more discussion.
…vm#164768) Background: X86 APX feature adds 16 registers within the same 64-bit mode. PR llvm#164638 is trying to extend such registers for FASTCC. However, a blocker issue is calling convention cannot be changeable with or without a feature. The solution is to disable FASTCC if APX is not ready. This is an NFC change to the final code generation, becasue X86 doesn't define an alternative ABI for FASTCC in 64-bit mode. We can solve the potential compatibility issue of llvm#164638 with this patch.
…Reg (llvm#168661)" (llvm#169219) Reland d5f3ab8, fix testcases.
…169260) This supposes to fix LLVM Buildbot failures after llvm#164768. I don't have the environment to verify though.
…lvm#169262) Interfaces can be optional: whether an op implements an interface or not can depend on the state of the operation. ``` // An optional code block for adding additional "classof" logic. This can // be used to better enable "optional" interfaces, where an entity only // implements the interface if some dynamic characteristic holds. // `$_attr`/`$_op`/`$_type` may be used to refer to an instance of the // interface instance being checked. code extraClassOf = ""; ``` The current `Pass::canScheduleOn(RegisteredOperationName)` is insufficient. This commit adds an additional overload to inspect `Operation *`. This commit fixes a crash when scheduling an `InterfacePass` for an optional interface on an operation that does not actually implement the interface. This is a re-upload of llvm#168499, which was reverted.
SPIR/SPIR-V are generic targets. Assume they support __bf16.
…llvm#169227) Merge the SkipBits!=0 handling into the first iteration of the word loop. This is the same code structure used by BitVector::find_first_in.
…lvm#169256) I've tested this locally, and the builtins build proceeds without a hitch for m68k-none-none. This is part of a larger effort to establish a working m68k baremetal toolchain.
Fixes a bug in `AMDGPUISelLowering` where alias analysis info is not propagated to split loads and stores. This is required for llvm#161375 --------- Co-authored-by: Leon Clark <leoclark@amd.com>
The motivation for this is that it would be useful to express a
vslideup/vslidedown in a target independent way e.g. from the loop
vectorizer.
We can do this today with @llvm.vector.splice by setting one operand to
poison:
- A slide down can be achieved with @llvm.vector.splice(%x, poison,
slideamt)
- A slide up can be done by @llvm.vector.splice(poison, %x, -slideamt)
E.g.:
splice(<a,b,c,d>, poison, 3) = <d,poison,poison,poison>
splice(poison, <a,b,c,d>, -3) = <poison,poison,poison,a>
These splices get lowered to a vslideup + vslidedown pair with one of
the vs2s being poison. We can optimize this away so that we are just
left with a single slideup/slidedown.
…er (llvm#168836) Two years ago, `operand_segment_sizes` and `result_segment_sizes` were renamed to `operandSegmentSizes` and `resultSegmentSizes` (check related commits, e.g. llvm@363b655). However, the op verifiers in IRDL loading phase is still using old attributes like `operand_segment_sizes` and `result_segment_sizes`, which causes some conflict, e.g. it is not compatible with the OpView builder in MLIR python bindings (which generates camelCase segment attributes). This PR is to support to use camelCase segment size attributes in IRDL verifier. Note that support of `operand_segment_sizes` and `result_segment_sizes` is dropped. I found this issue since I'm working on a new IRDL wrapper in the MLIR python bindings.
…hain (llvm#168135)" breaks build of CK This reverts commit 9e9fe08.
Merge commit '669e22f6553c5f9bca2d40a34cbfde9a770033f8' into HEAD
Merge remote-tracking branch 'external-mirror/promotion/amd-mainline/2025.11.24' into HEAD
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
Bulk Promotion from amd-staging commit: 669e22f