Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Dec 9, 2025

No description provided.

Meinersbur and others added 30 commits December 9, 2025 12:54
…) (llvm#169638)

Reapplication of llvm#137828, changes:
* Workaround CMAKE_Fortran_PREPROCESS_SOURCE issue for CMake < 2.24: The
issue is that `try_compile` does not forward manually-defined compiler
flang variables to the test build environment; instead of just a
negative test result, it aborts the configuration step itself. To be
fair, manually defining these variables is deprecated since at least
CMake 3.6.
* Missing flang cmd line flags for CMake < 3.28 `-target=`, `-O2`, `-O3`
* It is now possible to set FLANG_RT_ENABLED_STATIC=OFF and
FLANG_RT_ENABLE_SHARED=OFF at the same and is the default for amdgpu and
nvptx targets. In this mode, only the .mod files are compiled --
necessary for module files in
lib/clang/22/finclude/flang/(nvptx64-nvidia-cuda|amdgpu-amd-amdhsa)/*.mod
to be available.
* For compiling omp_lib.mod for nvptx and amdgpu, the module build
functionality must be hoisted out if openmp's runtime/ directory which
is only included for host targets. This PR now requires llvm#169909.
 

Move building the .mod files from openmp/flang to openmp/flang-rt using
a shared mechanism. Motivations to do so are:

1. Most modules are target-dependent and need to be re-compiled for each
target separately, which is something the LLVM_ENABLE_RUNTIMES system
already does. Prime example is `iso_c_binding.mod` which encodes the
target's ABI. Constants such as [`c_long_double` also have different
values](https://github.com/llvm/llvm-project/blob/d748c81218bee39dafb9cc0c00ed7831a3ed44c3/flang-rt/lib/runtime/iso_c_binding.f90#L77-L81).
Most other modules have `#ifdef`-enclosed code as well. For instance
this caused offload targets nvptx64-nvidia-cuda/amdgpu-amd-amdhsa to use
the modules files compiled for the host which may contrain uses of the
types REAL(10) or REAL(16) not available for nvptx/amdgpu.

llvm#146876
llvm#128015
llvm#129742
llvm#158790

3. CMake has support for Fortran that we should use. Among other things,
it automatically determines module dependencies so there is no need to
hardcode them in the CMakeLists.txt.

4. It allows using Fortran itself to implement Flang-RT. Currently, only
`iso_fortran_env_impl.f90` emits object files that are needed by Fortran
applications (llvm#89403). The workaround of llvm#95388 could be reverted (PR
llvm#169525).


If using Flang for cross-compilation or target-offloading, flang-rt must
now be compiled for each target not only for the library, but also to
get the target-specific module files. For instance in a bootstrapping
runtime build, this can be done by adding:
`-DLLVM_RUNTIME_TARGETS=default;nvptx64-nvidia-cuda;amdgpu-amd-amdhsa`.


Some new dependencies come into play:
* openmp depends on flang-rt for building `lib_omp.mod` and
`lib_omp_kinds.mod`. Currently, if flang-rt is not found then the
modules are not built.
* check-flang depends on flang-rt: If not found, the majority of tests
are disabled. If not building in a bootstrpping build, the location of
the module files can be pointed to using
`-DFLANG_INTRINSIC_MODULES_DIR=<path>`, e.g. in a flang-standalone
build. Alternatively, the test needing any of the intrinsic modules
could be marked with `REQUIRES: flangrt-modules`.
* check-flang depends on openmp: Not a change; tests requiring
`lib_omp.mod` and `lib_omp_kinds.mod` those are already marked with
`openmp_runtime`.

As intrinsic are now specific to the target, their location is moved
from `include/flang` to `<resource-dir>/finclude/flang/<triple>`. The
mechnism to compute the location have been moved from flang-rt
(previously used to compute the location of `libflang_rt.*.a`) to common
locations in `cmake/GetToolchainDirs.cmake` and
`runtimes/CMakeLists.txt` so they can be used by both, openmp and
flang-rt. Potentially the mechnism could also be shared by other
libraries such as compiler-rt.

`finclude` was chosen because `gfortran` uses it as well and avoids
misuse such as `#include <flang/iso_c_binding.mod>`. The search location
is now determined by `ToolChain` in the driver, instead of by the
frontend. Another subdirectory `flang` avoids accidental inclusion of
gfortran-modules which due to compression would result in
user-unfriendly errors. Now the driver adds `-fintrinsic-module-path`
for that location to the frontend call (Just like gfortran does).
`-fintrinsic-module-path` had to be fixed for this because ironically it
was only added to `searchDirectories`, but not
`intrinsicModuleDirectories_`. Since the driver determines the location,
tests invoking `flang -fc1` and `bbc` must also be passed the location
by llvm-lit. This works like llvm-lit does for finding the include dirs
for Clang using `-print-file-name=...`.
ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to
remove them. This also requires adjusting the code in VPlanUnroll.cpp to
split off handling of ExtractLastLane/ExtractPenultimateElement for
scalar VFs, which now needs to match ExtractLastPart.

PR: llvm#171145
…eallocationPipeline` (llvm#171305)

Add an overload that does not take any options and uses the default
options instead.
Single backtick tries to make a reference to something
and if that fails, renders as plain text.

These 3 weren't finding a reference and so produced
a warning:
variable.rst:975: WARNING: 'any' reference target not found: max_children
This tries to parse the block as that language but in these
cases fails because they aren't purely that language. This
falls back to a permissive mode which is fine, but highlights
the invalid tokens like errors which isn't great.

Instead don't try to highlight these blocks. This fixes 4
warnings seen in the docs build:
lldb/docs/use/tutorials/custom-frame-recognizers.md:43: WARNING: Lexing literal_block <...> as "c++" resulted in an error at token: '#'. Retrying in relaxed mode.
lldb/docs/use/tutorials/script-driven-debugging.md:175: WARNING: Lexing literal_block <...> as "c++" resulted in an error at token: '#'. Retrying in relaxed mode.
lldb/docs/use/tutorials/script-driven-debugging.md:426: WARNING: Lexing literal_block <...> as "c++" resulted in an error at token: '#'. Retrying in relaxed mode.
lldb/docs/use/tutorials/writing-custom-commands.md:416: WARNING: Lexing literal_block <...> as "python3" resulted in an error at token: '$'. Retrying in relaxed mode.
Follow-on from llvm#170324 to also refactor the NEON tests to reuse the
input assembly across all Neoverse cores.

The approach is as follows:

- Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so
  first combine those.
- Inputs for V2/V3/V3AE NEON tests are also already identical, but
  differ from N-cores, so combine those separately.
- Most significantly, input for V1 differs from all other cores
  primarily because of 24f0901 (llvm#128892).
- Split out features that are not supported across all cores.
  - Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this
    feature but all other Neoverse cores do. Also adds coverage for
    N2/N3 since they were missing tests.
  - Split out FEAT_BF16. V1 doesn't have this feature but all other
    Neoverse cores do. Also adds coverage for N1/N2/N3 since they were
    missing tests.
  - Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other
    Neoverse cores do. Also adds coverage for N2/N3 since they were
    missing tests.
- Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing
  N[1-3] against V[2-3] the only change the N cores have that V[2-3]
  dont is:
```
  < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5
  ---
  > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5
```
  So we take it for all cores. The rest of the diff is
  instructions in V[2-3] that arent in N cores, so we also take them.

  All Neoverse cores can optionally support the Cryptographic Extension.
The related features (AES, ...) are enabled by default for V1/N1 but not
  the other cores, so need to be explicitly enabled via -mattr.
- Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3
  - loads/stores are blended
  - duplicates with different spaces like `shll   v0.2d, v0.2s, #32` are
    removed
- the rest of the diff is instructions in V1 that are not tested in the
    other cores, so we add them for the other cores
RST tries to resolve things in single backticks to a reference,
which is not the intention here. Double backticks indicates
plain text formatting.

Fixes warnings in the docs build:
contributing.rst:92: WARNING: 'any' reference target not found: A1
contributing.rst:92: WARNING: 'any' reference target not found: B1
contributing.rst:92: WARNING: 'any' reference target not found: B2
contributing.rst:92: WARNING: 'any' reference target not found: A2
contributing.rst:95: WARNING: 'any' reference target not found: A1->B1
contributing.rst:95: WARNING: 'any' reference target not found: B2->C2
contributing.rst:95: WARNING: 'any' reference target not found: C3->A3
contributing.rst:100: WARNING: 'any' reference target not found: LLDB_ACCEPTABLE_PLUGIN_DEPENDENCIES
contributing.rst:100: WARNING: 'any' reference target not found: LLDB_TOLERATED_PLUGIN_DEPENDENCIES
All these are using H1 for the main heading but H3 for the
rest, Sphinx warns about this:
WARNING: Non-consecutive header level increase; H1 to H3 [myst.header]
The pass was already "reinventing" the concept just to deal with 16 bit
registers. Clean up the entire tracking logic to only use register
units.

There are no test changes because functionality didn't change, except:
- We can now track more LDS DMA IDs if we need it (up to `1 << 16`)
- The debug prints also changed a bit because we now talk in terms of
register units.

This also changes the tracking to use a DenseMap instead of a massive
fixed size table. This trades a bit of access speed for a smaller memory
footprint. Allocating and memsetting a huge table to zero caused a
non-negligible performance impact (I've observed up to 50% of the time
in the pass spent in the `memcpy` built-in on a big test file).

I also think we don't access these often enough to really justify using
a vector. We do a few accesses per instruction, but not much more. In a
huge 120MB LL file, I can barely see the trace of the DenseMap accesses.
This GenericEnum was just adding separate values for VOP3P_Pseudo
opcodes in the same namespace as existing opcodes that did not match.
They were defined in AMDGPUGenSearchableTables.inc by tablegen emitter
but were guarded out by #ifdef. Because of that, they were never
included in the code, so the compiler never reported the naming
conflict and the bug never had a chance to surface.
Implements ARM-software/acle#404

This allows the user to specify "featA+featB;priority=[1-255]" where
priority=255 means highest priority. If the explicit priority string is
omitted then the priority of "featA+featB" is implied, which is lower
than priority=1.

Internally this gets expanded using special FMV features P0 ... P7 which
can encode up to 256-1 priority levels (excluding all zeros). Those do
not have corresponding detection bit at pos FEAT_#enum so I made this
field optional in FMVInfo. Also they don't affect the codegen or name
mangling of versioned functions.
This is followup patch to
llvm#157680, which allows simd
fpcvt instructions to be generated from l/llround and l/llrint nodes.
Without this gcc warned
 ../lib/Frontend/OpenMP/OMPIRBuilder.cpp:5082:45: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
Without this gcc warned
 ../../mlir/lib/Dialect/SCF/IR/SCF.cpp:3748:70: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
This patch updates various LLVM headers to properly add the `LLVM_ABI`
and `LLVM_ABI_FOR_TEST` annotations ot build LLVM as a DLL on Windows.

This effort is tracked in llvm#109483.
…1126)

Similar to llvm#167760 this makes the list of LSE atomics explicit in case
new operations are added in the future. UIncWrap, UDecWrap, USubCond and
USubSat are excluded.

Fixes llvm#170450
All the constant construction APIs already have native splat
support. They can be directly used with a vector. It's not
necessary to first create a scalar constant and then splat it
to the element count.
SplatVal is not modified in these functions, so pass it by value.
This was probably a copy&paste mistake from checkConstantVector(),
which does modify SplatVal.
ConstantInt::get() already knows how to create splats, no need to
do it manually.
The offset here is a signed quantity.
nikic and others added 5 commits December 9, 2025 16:02
This is encoded as a signed value, so use getSigned().
To match the signed int parameter for the value.
# Conflicts:
#	flang-rt/CMakeLists.txt
#	flang-rt/lib/CMakeLists.txt
#	flang-rt/lib/runtime/CMakeLists.txt
#	flang-rt/lib/runtime/f90deviceio.f90
#	flang/test/Lower/OpenMP/target-enter-data-default-openmp52.f90
#	flang/tools/f18/CMakeLists.txt
#	llvm/runtimes/CMakeLists.txt
@z1-cciauto
Copy link
Collaborator

This fixes the buildbot failures from
llvm#150267.

I could not reproduce them locally but my intuition suggests that the
-O3 option on the RUN line behaves incosistently on different hosts
judging from the error logs.

My intention was to run an integration test which will use llvm's
globalopt pass, but there's no need actually. We have unittests in place
for it.
@z1-cciauto
Copy link
Collaborator

@ronlieb
Copy link
Collaborator Author

ronlieb commented Dec 10, 2025

npsdb failed lit tests, one new test. https://compiler-ci.amd.com/blue/organizations/jenkins/compiler-psdb-amd-staging/detail/compiler-psdb-amd-staging/3206/pipeline/634/

CP'ed upstream fix [clang][FMV][AArch64] Remove O3 from failing test (llvm#171457)
reran build to include lits, passes

@ronlieb ronlieb merged commit 0b6f58e into amd-staging Dec 10, 2025
9 of 10 checks passed
@ronlieb ronlieb deleted the amd/merge/upstream_merge_20251209125822 branch December 10, 2025 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.