merge main into amd-staging #803

ronlieb · 2025-12-09T20:45:15Z

No description provided.

…) (llvm#169638) Reapplication of llvm#137828, changes: * Workaround CMAKE_Fortran_PREPROCESS_SOURCE issue for CMake < 2.24: The issue is that `try_compile` does not forward manually-defined compiler flang variables to the test build environment; instead of just a negative test result, it aborts the configuration step itself. To be fair, manually defining these variables is deprecated since at least CMake 3.6. * Missing flang cmd line flags for CMake < 3.28 `-target=`, `-O2`, `-O3` * It is now possible to set FLANG_RT_ENABLED_STATIC=OFF and FLANG_RT_ENABLE_SHARED=OFF at the same and is the default for amdgpu and nvptx targets. In this mode, only the .mod files are compiled -- necessary for module files in lib/clang/22/finclude/flang/(nvptx64-nvidia-cuda|amdgpu-amd-amdhsa)/*.mod to be available. * For compiling omp_lib.mod for nvptx and amdgpu, the module build functionality must be hoisted out if openmp's runtime/ directory which is only included for host targets. This PR now requires llvm#169909. Move building the .mod files from openmp/flang to openmp/flang-rt using a shared mechanism. Motivations to do so are: 1. Most modules are target-dependent and need to be re-compiled for each target separately, which is something the LLVM_ENABLE_RUNTIMES system already does. Prime example is `iso_c_binding.mod` which encodes the target's ABI. Constants such as [`c_long_double` also have different values](https://github.com/llvm/llvm-project/blob/d748c81218bee39dafb9cc0c00ed7831a3ed44c3/flang-rt/lib/runtime/iso_c_binding.f90#L77-L81). Most other modules have `#ifdef`-enclosed code as well. For instance this caused offload targets nvptx64-nvidia-cuda/amdgpu-amd-amdhsa to use the modules files compiled for the host which may contrain uses of the types REAL(10) or REAL(16) not available for nvptx/amdgpu. llvm#146876 llvm#128015 llvm#129742 llvm#158790 3. CMake has support for Fortran that we should use. Among other things, it automatically determines module dependencies so there is no need to hardcode them in the CMakeLists.txt. 4. It allows using Fortran itself to implement Flang-RT. Currently, only `iso_fortran_env_impl.f90` emits object files that are needed by Fortran applications (llvm#89403). The workaround of llvm#95388 could be reverted (PR llvm#169525). If using Flang for cross-compilation or target-offloading, flang-rt must now be compiled for each target not only for the library, but also to get the target-specific module files. For instance in a bootstrapping runtime build, this can be done by adding: `-DLLVM_RUNTIME_TARGETS=default;nvptx64-nvidia-cuda;amdgpu-amd-amdhsa`. Some new dependencies come into play: * openmp depends on flang-rt for building `lib_omp.mod` and `lib_omp_kinds.mod`. Currently, if flang-rt is not found then the modules are not built. * check-flang depends on flang-rt: If not found, the majority of tests are disabled. If not building in a bootstrpping build, the location of the module files can be pointed to using `-DFLANG_INTRINSIC_MODULES_DIR=<path>`, e.g. in a flang-standalone build. Alternatively, the test needing any of the intrinsic modules could be marked with `REQUIRES: flangrt-modules`. * check-flang depends on openmp: Not a change; tests requiring `lib_omp.mod` and `lib_omp_kinds.mod` those are already marked with `openmp_runtime`. As intrinsic are now specific to the target, their location is moved from `include/flang` to `<resource-dir>/finclude/flang/<triple>`. The mechnism to compute the location have been moved from flang-rt (previously used to compute the location of `libflang_rt.*.a`) to common locations in `cmake/GetToolchainDirs.cmake` and `runtimes/CMakeLists.txt` so they can be used by both, openmp and flang-rt. Potentially the mechnism could also be shared by other libraries such as compiler-rt. `finclude` was chosen because `gfortran` uses it as well and avoids misuse such as `#include <flang/iso_c_binding.mod>`. The search location is now determined by `ToolChain` in the driver, instead of by the frontend. Another subdirectory `flang` avoids accidental inclusion of gfortran-modules which due to compression would result in user-unfriendly errors. Now the driver adds `-fintrinsic-module-path` for that location to the frontend call (Just like gfortran does). `-fintrinsic-module-path` had to be fixed for this because ironically it was only added to `searchDirectories`, but not `intrinsicModuleDirectories_`. Since the driver determines the location, tests invoking `flang -fc1` and `bbc` must also be passed the location by llvm-lit. This works like llvm-lit does for finding the include dirs for Clang using `-print-file-name=...`.

ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to remove them. This also requires adjusting the code in VPlanUnroll.cpp to split off handling of ExtractLastLane/ExtractPenultimateElement for scalar VFs, which now needs to match ExtractLastPart. PR: llvm#171145

…eallocationPipeline` (llvm#171305) Add an overload that does not take any options and uses the default options instead.

…lvm#171427) Fixes failures in llvm#171058 (comment)

Single backtick tries to make a reference to something and if that fails, renders as plain text. These 3 weren't finding a reference and so produced a warning: variable.rst:975: WARNING: 'any' reference target not found: max_children

…vm#166189) In clangd, we use the non-ast version one.

This tries to parse the block as that language but in these cases fails because they aren't purely that language. This falls back to a permissive mode which is fine, but highlights the invalid tokens like errors which isn't great. Instead don't try to highlight these blocks. This fixes 4 warnings seen in the docs build: lldb/docs/use/tutorials/custom-frame-recognizers.md:43: WARNING: Lexing literal_block <...> as "c++" resulted in an error at token: '#'. Retrying in relaxed mode. lldb/docs/use/tutorials/script-driven-debugging.md:175: WARNING: Lexing literal_block <...> as "c++" resulted in an error at token: '#'. Retrying in relaxed mode. lldb/docs/use/tutorials/script-driven-debugging.md:426: WARNING: Lexing literal_block <...> as "c++" resulted in an error at token: '#'. Retrying in relaxed mode. lldb/docs/use/tutorials/writing-custom-commands.md:416: WARNING: Lexing literal_block <...> as "python3" resulted in an error at token: '$'. Retrying in relaxed mode.

Follow-on from llvm#170324 to also refactor the NEON tests to reuse the input assembly across all Neoverse cores. The approach is as follows: - Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so first combine those. - Inputs for V2/V3/V3AE NEON tests are also already identical, but differ from N-cores, so combine those separately. - Most significantly, input for V1 differs from all other cores primarily because of 24f0901 (llvm#128892). - Split out features that are not supported across all cores. - Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Split out FEAT_BF16. V1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N1/N2/N3 since they were missing tests. - Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing N[1-3] against V[2-3] the only change the N cores have that V[2-3] dont is: ``` < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5 --- > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5 ``` So we take it for all cores. The rest of the diff is instructions in V[2-3] that arent in N cores, so we also take them. All Neoverse cores can optionally support the Cryptographic Extension. The related features (AES, ...) are enabled by default for V1/N1 but not the other cores, so need to be explicitly enabled via -mattr. - Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3 - loads/stores are blended - duplicates with different spaces like `shll v0.2d, v0.2s, #32` are removed - the rest of the diff is instructions in V1 that are not tested in the other cores, so we add them for the other cores

RST tries to resolve things in single backticks to a reference, which is not the intention here. Double backticks indicates plain text formatting. Fixes warnings in the docs build: contributing.rst:92: WARNING: 'any' reference target not found: A1 contributing.rst:92: WARNING: 'any' reference target not found: B1 contributing.rst:92: WARNING: 'any' reference target not found: B2 contributing.rst:92: WARNING: 'any' reference target not found: A2 contributing.rst:95: WARNING: 'any' reference target not found: A1->B1 contributing.rst:95: WARNING: 'any' reference target not found: B2->C2 contributing.rst:95: WARNING: 'any' reference target not found: C3->A3 contributing.rst:100: WARNING: 'any' reference target not found: LLDB_ACCEPTABLE_PLUGIN_DEPENDENCIES contributing.rst:100: WARNING: 'any' reference target not found: LLDB_TOLERATED_PLUGIN_DEPENDENCIES

All these are using H1 for the main heading but H3 for the rest, Sphinx warns about this: WARNING: Non-consecutive header level increase; H1 to H3 [myst.header]

The pass was already "reinventing" the concept just to deal with 16 bit registers. Clean up the entire tracking logic to only use register units. There are no test changes because functionality didn't change, except: - We can now track more LDS DMA IDs if we need it (up to `1 << 16`) - The debug prints also changed a bit because we now talk in terms of register units. This also changes the tracking to use a DenseMap instead of a massive fixed size table. This trades a bit of access speed for a smaller memory footprint. Allocating and memsetting a huge table to zero caused a non-negligible performance impact (I've observed up to 50% of the time in the pass spent in the `memcpy` built-in on a big test file). I also think we don't access these often enough to really justify using a vector. We do a few accesses per instruction, but not much more. In a huge 120MB LL file, I can barely see the trace of the DenseMap accesses.

This GenericEnum was just adding separate values for VOP3P_Pseudo opcodes in the same namespace as existing opcodes that did not match. They were defined in AMDGPUGenSearchableTables.inc by tablegen emitter but were guarded out by #ifdef. Because of that, they were never included in the code, so the compiler never reported the naming conflict and the bug never had a chance to surface.

This was removed in the specification by: arm/tosa-specification#11

Implements ARM-software/acle#404 This allows the user to specify "featA+featB;priority=[1-255]" where priority=255 means highest priority. If the explicit priority string is omitted then the priority of "featA+featB" is implied, which is lower than priority=1. Internally this gets expanded using special FMV features P0 ... P7 which can encode up to 256-1 priority levels (excluding all zeros). Those do not have corresponding detection bit at pos FEAT_#enum so I made this field optional in FMVInfo. Also they don't affect the codegen or name mangling of versioned functions.

This is followup patch to llvm#157680, which allows simd fpcvt instructions to be generated from l/llround and l/llrint nodes.

…162077)" Fails on https://lab.llvm.org/buildbot/#/builders/123/builds/31922 This reverts commit bf93440.

Without this gcc warned ../lib/Frontend/OpenMP/OMPIRBuilder.cpp:5082:45: warning: suggest parentheses around '&&' within '||' [-Wparentheses]

Without this gcc warned ../../mlir/lib/Dialect/SCF/IR/SCF.cpp:3748:70: warning: suggest parentheses around '&&' within '||' [-Wparentheses]

…nsformOps.cpp (NFC)

… mlir-irdl-to-cpp.cpp (NFC)

…lization in NVGPUTransformOps.cpp (NFC)

This patch updates various LLVM headers to properly add the `LLVM_ABI` and `LLVM_ABI_FOR_TEST` annotations ot build LLVM as a DLL on Windows. This effort is tracked in llvm#109483.

…1126) Similar to llvm#167760 this makes the list of LSE atomics explicit in case new operations are added in the future. UIncWrap, UDecWrap, USubCond and USubSat are excluded. Fixes llvm#170450

…Attr.cpp (NFC)

All the constant construction APIs already have native splat support. They can be directly used with a vector. It's not necessary to first create a scalar constant and then splat it to the element count.

SplatVal is not modified in these functions, so pass it by value. This was probably a copy&paste mistake from checkConstantVector(), which does modify SplatVal.

ConstantInt::get() already knows how to create splats, no need to do it manually.

The offset here is a signed quantity.

This may be -1.

This is encoded as a signed value, so use getSigned().

To match the signed int parameter for the value.

# Conflicts: # flang-rt/CMakeLists.txt # flang-rt/lib/CMakeLists.txt # flang-rt/lib/runtime/CMakeLists.txt # flang-rt/lib/runtime/f90deviceio.f90 # flang/test/Lower/OpenMP/target-enter-data-default-openmp52.f90 # flang/tools/f18/CMakeLists.txt # llvm/runtimes/CMakeLists.txt

…lvm#137828) (llvm#169638)" needs more work This reverts commit 7675fc7.

z1-cciauto · 2025-12-09T20:46:23Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3206

This fixes the buildbot failures from llvm#150267. I could not reproduce them locally but my intuition suggests that the -O3 option on the RUN line behaves incosistently on different hosts judging from the error logs. My intention was to run an integration test which will use llvm's globalopt pass, but there's no need actually. We have unittests in place for it.

z1-cciauto · 2025-12-10T00:02:01Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3208

ronlieb · 2025-12-10T01:33:28Z

npsdb failed lit tests, one new test. https://compiler-ci.amd.com/blue/organizations/jenkins/compiler-psdb-amd-staging/detail/compiler-psdb-amd-staging/3206/pipeline/634/

CP'ed upstream fix [clang][FMV][AArch64] Remove O3 from failing test (llvm#171457)
reran build to include lits, passes

Meinersbur and others added 30 commits December 9, 2025 12:54

[mlir][bufferization][NFC] Add convenience overload for `buildBufferD…

182a59d

…eallocationPipeline` (llvm#171305) Add an overload that does not take any options and uses the default options instead.

[clang-tidy] Add missing Modernize module to Google module link libs (l…

9c83428

…lvm#171427) Fixes failures in llvm#171058 (comment)

[lldb][docs] Fix plaintext marker in variables doc

31c03c9

Single backtick tries to make a reference to something and if that fails, renders as plain text. These 3 weren't finding a reference and so produced a warning: variable.rst:975: WARNING: 'any' reference target not found: max_children

[clangd] Remove the unused AST-based code folding Implementation. (ll…

4e94198

…vm#166189) In clangd, we use the non-ast version one.

[lldb][docs] Fix title formatting in Variable document

afb3852

[lldb][docs] Fix header level warnings in a few documents

b1ef2db

All these are using H1 for the main heading but H3 for the rest, Sphinx warns about this: WARNING: Non-consecutive header level increase; H1 to H3 [myst.header]

[mlir][tosa] Remove EXT_MXFP support for cast (llvm#167301)

719d079

This was removed in the specification by: arm/tosa-specification#11

[AArch64]SIMD fpcvt codegen for rounding nodes (llvm#165546)

2766002

This is followup patch to llvm#157680, which allows simd fpcvt instructions to be generated from l/llround and l/llrint nodes.

Revert "[AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (llvm#…

4572f4f

…162077)" Fails on https://lab.llvm.org/buildbot/#/builders/123/builds/31922 This reverts commit bf93440.

[OpenMP] Fix -Wparentheses warning [NFC]

f90fe01

Without this gcc warned ../lib/Frontend/OpenMP/OMPIRBuilder.cpp:5082:45: warning: suggest parentheses around '&&' within '||' [-Wparentheses]

[mlir] Fix -Wparentheses warning [NFC]

7f6c907

Without this gcc warned ../../mlir/lib/Dialect/SCF/IR/SCF.cpp:3748:70: warning: suggest parentheses around '&&' within '||' [-Wparentheses]

[MLIR] Apply clang-tidy fixes for llvm-else-after-return in LinalgTra…

e3cf462

…nsformOps.cpp (NFC)

[MLIR] Apply clang-tidy fixes for readability-container-size-empty in…

257417e

… mlir-irdl-to-cpp.cpp (NFC)

[MLIR] Apply clang-tidy fixes for performance-unnecessary-copy-initia…

95e6edc

…lization in NVGPUTransformOps.cpp (NFC)

Add more missing LLVM_ABI annotations (llvm#168765)

d478baa

This patch updates various LLVM headers to properly add the `LLVM_ABI` and `LLVM_ABI_FOR_TEST` annotations ot build LLVM as a DLL on Windows. This effort is tracked in llvm#109483.

[AArch64] Make the list of LSE supported operations explicit (llvm#17…

fe68fb6

…1126) Similar to llvm#167760 this makes the list of LSE atomics explicit in case new operations are added in the future. UIncWrap, UDecWrap, USubCond and USubSat are excluded. Fixes llvm#170450

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in SelectObject…

734ea9a

…Attr.cpp (NFC)

[Hexagon] Remove unnecessarily complicated helpers (NFC)

2f502f3

All the constant construction APIs already have native splat support. They can be directly used with a vector. It's not necessary to first create a scalar constant and then splat it to the element count.

[Hexagon] Avoid unnecessary by reference passing (NFC)

bf41fd7

SplatVal is not modified in these functions, so pass it by value. This was probably a copy&paste mistake from checkConstantVector(), which does modify SplatVal.

[Hexagon] Simplify creation of splat value (NFC)

a6fa720

ConstantInt::get() already knows how to create splats, no need to do it manually.

[X86] Use getSigned() for segment offset

45267ec

The offset here is a signed quantity.

[SjLjEHPrepare] Use getSigned() for call site number

bd8c063

This may be -1.

nikic and others added 5 commits December 9, 2025 16:02

[Bitcode] Use ConstantInt::getSigned()

53ce850

This is encoded as a signed value, so use getSigned().

[ThumbRegisterInfo] Use getSigned() for constant pool loads

db59def

To match the signed int parameter for the value.

Revert "[Flang] Move builtin .mod generation into runtimes (Reapply l…

ea7f2a4

…lvm#137828) (llvm#169638)" needs more work This reverts commit 7675fc7.

merge main into amd-staging

bcb7fe9

ronlieb requested review from a team and dpalermo December 9, 2025 20:45

ronlieb requested review from fabianmcg and nicolasvasilache as code owners December 9, 2025 20:45

ronlieb removed request for fabianmcg and nicolasvasilache December 9, 2025 20:45

dpalermo approved these changes Dec 9, 2025

View reviewed changes

ronlieb merged commit 0b6f58e into amd-staging Dec 10, 2025
9 of 10 checks passed

ronlieb deleted the amd/merge/upstream_merge_20251209125822 branch December 10, 2025 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #803

merge main into amd-staging #803

Uh oh!

ronlieb commented Dec 9, 2025

Uh oh!

z1-cciauto commented Dec 9, 2025

Uh oh!

z1-cciauto commented Dec 10, 2025

Uh oh!

ronlieb commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge main into amd-staging #803

merge main into amd-staging #803

Uh oh!

Conversation

ronlieb commented Dec 9, 2025

Uh oh!

z1-cciauto commented Dec 9, 2025

Uh oh!

z1-cciauto commented Dec 10, 2025

Uh oh!

ronlieb commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants