forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge main into amd-staging #803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ronlieb
merged 36 commits into
amd-staging
from
amd/merge/upstream_merge_20251209125822
Dec 10, 2025
Merged
merge main into amd-staging #803
ronlieb
merged 36 commits into
amd-staging
from
amd/merge/upstream_merge_20251209125822
Dec 10, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…) (llvm#169638) Reapplication of llvm#137828, changes: * Workaround CMAKE_Fortran_PREPROCESS_SOURCE issue for CMake < 2.24: The issue is that `try_compile` does not forward manually-defined compiler flang variables to the test build environment; instead of just a negative test result, it aborts the configuration step itself. To be fair, manually defining these variables is deprecated since at least CMake 3.6. * Missing flang cmd line flags for CMake < 3.28 `-target=`, `-O2`, `-O3` * It is now possible to set FLANG_RT_ENABLED_STATIC=OFF and FLANG_RT_ENABLE_SHARED=OFF at the same and is the default for amdgpu and nvptx targets. In this mode, only the .mod files are compiled -- necessary for module files in lib/clang/22/finclude/flang/(nvptx64-nvidia-cuda|amdgpu-amd-amdhsa)/*.mod to be available. * For compiling omp_lib.mod for nvptx and amdgpu, the module build functionality must be hoisted out if openmp's runtime/ directory which is only included for host targets. This PR now requires llvm#169909. Move building the .mod files from openmp/flang to openmp/flang-rt using a shared mechanism. Motivations to do so are: 1. Most modules are target-dependent and need to be re-compiled for each target separately, which is something the LLVM_ENABLE_RUNTIMES system already does. Prime example is `iso_c_binding.mod` which encodes the target's ABI. Constants such as [`c_long_double` also have different values](https://github.com/llvm/llvm-project/blob/d748c81218bee39dafb9cc0c00ed7831a3ed44c3/flang-rt/lib/runtime/iso_c_binding.f90#L77-L81). Most other modules have `#ifdef`-enclosed code as well. For instance this caused offload targets nvptx64-nvidia-cuda/amdgpu-amd-amdhsa to use the modules files compiled for the host which may contrain uses of the types REAL(10) or REAL(16) not available for nvptx/amdgpu. llvm#146876 llvm#128015 llvm#129742 llvm#158790 3. CMake has support for Fortran that we should use. Among other things, it automatically determines module dependencies so there is no need to hardcode them in the CMakeLists.txt. 4. It allows using Fortran itself to implement Flang-RT. Currently, only `iso_fortran_env_impl.f90` emits object files that are needed by Fortran applications (llvm#89403). The workaround of llvm#95388 could be reverted (PR llvm#169525). If using Flang for cross-compilation or target-offloading, flang-rt must now be compiled for each target not only for the library, but also to get the target-specific module files. For instance in a bootstrapping runtime build, this can be done by adding: `-DLLVM_RUNTIME_TARGETS=default;nvptx64-nvidia-cuda;amdgpu-amd-amdhsa`. Some new dependencies come into play: * openmp depends on flang-rt for building `lib_omp.mod` and `lib_omp_kinds.mod`. Currently, if flang-rt is not found then the modules are not built. * check-flang depends on flang-rt: If not found, the majority of tests are disabled. If not building in a bootstrpping build, the location of the module files can be pointed to using `-DFLANG_INTRINSIC_MODULES_DIR=<path>`, e.g. in a flang-standalone build. Alternatively, the test needing any of the intrinsic modules could be marked with `REQUIRES: flangrt-modules`. * check-flang depends on openmp: Not a change; tests requiring `lib_omp.mod` and `lib_omp_kinds.mod` those are already marked with `openmp_runtime`. As intrinsic are now specific to the target, their location is moved from `include/flang` to `<resource-dir>/finclude/flang/<triple>`. The mechnism to compute the location have been moved from flang-rt (previously used to compute the location of `libflang_rt.*.a`) to common locations in `cmake/GetToolchainDirs.cmake` and `runtimes/CMakeLists.txt` so they can be used by both, openmp and flang-rt. Potentially the mechnism could also be shared by other libraries such as compiler-rt. `finclude` was chosen because `gfortran` uses it as well and avoids misuse such as `#include <flang/iso_c_binding.mod>`. The search location is now determined by `ToolChain` in the driver, instead of by the frontend. Another subdirectory `flang` avoids accidental inclusion of gfortran-modules which due to compression would result in user-unfriendly errors. Now the driver adds `-fintrinsic-module-path` for that location to the frontend call (Just like gfortran does). `-fintrinsic-module-path` had to be fixed for this because ironically it was only added to `searchDirectories`, but not `intrinsicModuleDirectories_`. Since the driver determines the location, tests invoking `flang -fc1` and `bbc` must also be passed the location by llvm-lit. This works like llvm-lit does for finding the include dirs for Clang using `-print-file-name=...`.
ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to remove them. This also requires adjusting the code in VPlanUnroll.cpp to split off handling of ExtractLastLane/ExtractPenultimateElement for scalar VFs, which now needs to match ExtractLastPart. PR: llvm#171145
…eallocationPipeline` (llvm#171305) Add an overload that does not take any options and uses the default options instead.
Single backtick tries to make a reference to something and if that fails, renders as plain text. These 3 weren't finding a reference and so produced a warning: variable.rst:975: WARNING: 'any' reference target not found: max_children
…vm#166189) In clangd, we use the non-ast version one.
This tries to parse the block as that language but in these cases fails because they aren't purely that language. This falls back to a permissive mode which is fine, but highlights the invalid tokens like errors which isn't great. Instead don't try to highlight these blocks. This fixes 4 warnings seen in the docs build: lldb/docs/use/tutorials/custom-frame-recognizers.md:43: WARNING: Lexing literal_block <...> as "c++" resulted in an error at token: '#'. Retrying in relaxed mode. lldb/docs/use/tutorials/script-driven-debugging.md:175: WARNING: Lexing literal_block <...> as "c++" resulted in an error at token: '#'. Retrying in relaxed mode. lldb/docs/use/tutorials/script-driven-debugging.md:426: WARNING: Lexing literal_block <...> as "c++" resulted in an error at token: '#'. Retrying in relaxed mode. lldb/docs/use/tutorials/writing-custom-commands.md:416: WARNING: Lexing literal_block <...> as "python3" resulted in an error at token: '$'. Retrying in relaxed mode.
Follow-on from llvm#170324 to also refactor the NEON tests to reuse the input assembly across all Neoverse cores. The approach is as follows: - Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so first combine those. - Inputs for V2/V3/V3AE NEON tests are also already identical, but differ from N-cores, so combine those separately. - Most significantly, input for V1 differs from all other cores primarily because of 24f0901 (llvm#128892). - Split out features that are not supported across all cores. - Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Split out FEAT_BF16. V1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N1/N2/N3 since they were missing tests. - Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing N[1-3] against V[2-3] the only change the N cores have that V[2-3] dont is: ``` < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5 --- > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5 ``` So we take it for all cores. The rest of the diff is instructions in V[2-3] that arent in N cores, so we also take them. All Neoverse cores can optionally support the Cryptographic Extension. The related features (AES, ...) are enabled by default for V1/N1 but not the other cores, so need to be explicitly enabled via -mattr. - Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3 - loads/stores are blended - duplicates with different spaces like `shll v0.2d, v0.2s, #32` are removed - the rest of the diff is instructions in V1 that are not tested in the other cores, so we add them for the other cores
RST tries to resolve things in single backticks to a reference, which is not the intention here. Double backticks indicates plain text formatting. Fixes warnings in the docs build: contributing.rst:92: WARNING: 'any' reference target not found: A1 contributing.rst:92: WARNING: 'any' reference target not found: B1 contributing.rst:92: WARNING: 'any' reference target not found: B2 contributing.rst:92: WARNING: 'any' reference target not found: A2 contributing.rst:95: WARNING: 'any' reference target not found: A1->B1 contributing.rst:95: WARNING: 'any' reference target not found: B2->C2 contributing.rst:95: WARNING: 'any' reference target not found: C3->A3 contributing.rst:100: WARNING: 'any' reference target not found: LLDB_ACCEPTABLE_PLUGIN_DEPENDENCIES contributing.rst:100: WARNING: 'any' reference target not found: LLDB_TOLERATED_PLUGIN_DEPENDENCIES
All these are using H1 for the main heading but H3 for the rest, Sphinx warns about this: WARNING: Non-consecutive header level increase; H1 to H3 [myst.header]
The pass was already "reinventing" the concept just to deal with 16 bit registers. Clean up the entire tracking logic to only use register units. There are no test changes because functionality didn't change, except: - We can now track more LDS DMA IDs if we need it (up to `1 << 16`) - The debug prints also changed a bit because we now talk in terms of register units. This also changes the tracking to use a DenseMap instead of a massive fixed size table. This trades a bit of access speed for a smaller memory footprint. Allocating and memsetting a huge table to zero caused a non-negligible performance impact (I've observed up to 50% of the time in the pass spent in the `memcpy` built-in on a big test file). I also think we don't access these often enough to really justify using a vector. We do a few accesses per instruction, but not much more. In a huge 120MB LL file, I can barely see the trace of the DenseMap accesses.
This GenericEnum was just adding separate values for VOP3P_Pseudo opcodes in the same namespace as existing opcodes that did not match. They were defined in AMDGPUGenSearchableTables.inc by tablegen emitter but were guarded out by #ifdef. Because of that, they were never included in the code, so the compiler never reported the naming conflict and the bug never had a chance to surface.
This was removed in the specification by: arm/tosa-specification#11
Implements ARM-software/acle#404 This allows the user to specify "featA+featB;priority=[1-255]" where priority=255 means highest priority. If the explicit priority string is omitted then the priority of "featA+featB" is implied, which is lower than priority=1. Internally this gets expanded using special FMV features P0 ... P7 which can encode up to 256-1 priority levels (excluding all zeros). Those do not have corresponding detection bit at pos FEAT_#enum so I made this field optional in FMVInfo. Also they don't affect the codegen or name mangling of versioned functions.
This is followup patch to llvm#157680, which allows simd fpcvt instructions to be generated from l/llround and l/llrint nodes.
Without this gcc warned ../lib/Frontend/OpenMP/OMPIRBuilder.cpp:5082:45: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
Without this gcc warned ../../mlir/lib/Dialect/SCF/IR/SCF.cpp:3748:70: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
…nsformOps.cpp (NFC)
… mlir-irdl-to-cpp.cpp (NFC)
…lization in NVGPUTransformOps.cpp (NFC)
This patch updates various LLVM headers to properly add the `LLVM_ABI` and `LLVM_ABI_FOR_TEST` annotations ot build LLVM as a DLL on Windows. This effort is tracked in llvm#109483.
…1126) Similar to llvm#167760 this makes the list of LSE atomics explicit in case new operations are added in the future. UIncWrap, UDecWrap, USubCond and USubSat are excluded. Fixes llvm#170450
All the constant construction APIs already have native splat support. They can be directly used with a vector. It's not necessary to first create a scalar constant and then splat it to the element count.
SplatVal is not modified in these functions, so pass it by value. This was probably a copy&paste mistake from checkConstantVector(), which does modify SplatVal.
ConstantInt::get() already knows how to create splats, no need to do it manually.
The offset here is a signed quantity.
This may be -1.
This is encoded as a signed value, so use getSigned().
To match the signed int parameter for the value.
# Conflicts: # flang-rt/CMakeLists.txt # flang-rt/lib/CMakeLists.txt # flang-rt/lib/runtime/CMakeLists.txt # flang-rt/lib/runtime/f90deviceio.f90 # flang/test/Lower/OpenMP/target-enter-data-default-openmp52.f90 # flang/tools/f18/CMakeLists.txt # llvm/runtimes/CMakeLists.txt
…lvm#137828) (llvm#169638)" needs more work This reverts commit 7675fc7.
Collaborator
dpalermo
approved these changes
Dec 9, 2025
This fixes the buildbot failures from llvm#150267. I could not reproduce them locally but my intuition suggests that the -O3 option on the RUN line behaves incosistently on different hosts judging from the error logs. My intention was to run an integration test which will use llvm's globalopt pass, but there's no need actually. We have unittests in place for it.
Collaborator
Collaborator
Author
|
npsdb failed lit tests, one new test. https://compiler-ci.amd.com/blue/organizations/jenkins/compiler-psdb-amd-staging/detail/compiler-psdb-amd-staging/3206/pipeline/634/ CP'ed upstream fix [clang][FMV][AArch64] Remove O3 from failing test (llvm#171457) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.