Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/merge upstream 20240417 #312

Merged
merged 2,316 commits into from
Apr 30, 2024
Merged

Feature/merge upstream 20240417 #312

merged 2,316 commits into from
Apr 30, 2024

Conversation

kaz7
Copy link
Collaborator

@kaz7 kaz7 commented Apr 30, 2024

Merge upstream/main up to 2024/4/17.

This patch has been passed the iternal regression tests.

nico and others added 30 commits April 15, 2024 17:01
In #88317, the clang resource headers was converted to an interface
library. Update LLDB and fix the Xcode standalone build. Thanks Evan for
the help!
llvm/llvm-project#79940 put calls to
recomputeLiveIns into
a loop, to repeatedly call the function until the computation converges.
However,
this repeats a lot of code. This changes moves the loop into a function
to simplify
the handling.

Note that this changes the order in which recomputeLiveIns is called.
For example,

```
  bool anyChange = false;
  do {
    anyChange = recomputeLiveIns(*ExitMBB) || recomputeLiveIns(*LoopMBB);
  } while (anyChange);
```

only begins to recompute the live-ins for LoopMBB after the computation
for ExitMBB
has converged. With this change, all basic blocks have a recomputation
of the live-ins
for each loop iteration. This can result in less or more calls,
depending on the
situation.
Test, as expected, fails with Asan on system with 5lvl page tables.
Disabling the test to migrate buildbot.
…ommonInvocationForModuleBuild` into its own function (#88447)

The new function is about clearing out benign codegen options and can be
applied for PCH invocations as well.
Links to `llvm.mlir.global` and `llvm.mlir.addressof` in the ["Globals"
section of LLVM dialect
documentation](https://mlir.llvm.org/docs/Dialects/LLVM/#globals) are
broken.
…s (#86472)

The callee should preserve rbx according to the calling convention, but
it is not in the test case `ExecutionEngine/JITLink/x86-64/ELF_vtune.s`.
Not preserving the rbx register may result in some random error to the
caller function. This patch adds the missing command to preserve the
rbx.
Remove the fold working on abs in SPF representation now that we
canonicalize SPF to intrinsics.

This is not strictly NFC because the SPF fold might fire for
non-canonical IR due to multi-use, but given the lack of test coverage,
I assume this is not important.
This expansion is directly inspired by the analogous code in the x86
backend for LEA. shXadd and (this sub-case of) LEA are largely
equivalent.

This is an alternative to
llvm/llvm-project#87105.

This expansion is also supported via the decomposeMulByConstant
callback, but restricted because of interactions with other combines
since that code runs before legalization. As discussed in the other
review, my original plan had been to support post legalization expansion
through the same interface, but that ended up being more complicated
than seems justified.

Instead, lets go ahead and do the general expansion post-legalize. Other
targets use the combine approach, and matching that structure makes it
easier for us to adapt ideas from other targets to RISCV.
… extract_subvector(y,c2-c1) (#87925)"

This reverts commit 8c0f52e.

Reverting to green, reproducer attached in the PR/revision comments.
https://reviews.llvm.org/D54188 marked "alias" targets as used in C to
fix -Wunused false positives. This patch extends the approach to handle
mangled names to support global scope names in C++ and the
`overloadable` attribute in C.

In addition, we mark ifunc targets as used to fix #63957.

While our approach has false negatives for namespace scope names, the
majority of alias/ifunc C++ uses (global scope with no overloads) are
handled.

Note: The following function with internal linkage but C language
linkage type is mangled in Clang but not in GCC. This inconsistency
makes alias/ifunc difficult to use in C++ with portability (#88593).
```
extern "C" {
static void f0() {}
// GCC: void g0() __attribute__((alias("_ZL2f0v")));
// Clang: void g0() __attribute__((alias("f0")));
}
```

Pull Request: llvm/llvm-project#87130
…tors (#88643)

When build with option -msve-vector-bits=512, the return vaule of
Subtarget->getMinSVEVectorSizeInBits() is 512;
While the MinElts is still 4 for <vscale x 4 x double> in
getNumInterleavedAccesses, so it creates invalid
llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited.
Unlikely, the related custom spilting is not supported now.

Fix llvm/llvm-project#88247
…e aligned (#88435)

Increase alignment of `llvm.threadlocal.address` if the pointed to
global has higher alignment.
The parameter of `findDebugNamesOffsets` has been renamed to
`EndOfHeaderOffset` in #88064 to make it clear it is a section offset
instead of an offset relative to the current name index. Rename the call
site variable as well.
Use the Val type to estimate the instruction cost for ICmp.
…in op (#87163)

These previously were added in the C++ API in
778cf54, but without updating the enum
in the C API or mapping functions.

Corresponding tests for all current atomicrmw bin ops have been added as
well.
… (#88552)

For experimental.cttz.elts, we can use a vfirst instruction, but we need
to correct the result if input vector can be 0. cttz.elts returns the
vector length while vfirst returns -1.
When inserting truncs during IV widening, mark the trunc as either nuw
or nsw depending on whether zext or sext widening was used. For
non-negative IVs both nuw and nsw apply.
…ass member function (#85198)

Fix llvm/llvm-project#84020
Skip checking implicit object parameter in the context of
`RequiresExprBodyDecl`.

Co-authored-by: huqizhi <[email protected]>
Without this patch, you would typically use readNext as:

  readNext<uint32_t, llvm::endianness::little, unaligned>(Ptr)

which is quite mouthful.  Since most serialization/deserialization
operations are unaligned accesses, this patch makes the alignment
template parameter default to unaligned, allowing us to say:

  readNext<uint32_t, llvm::endianness::little>(Ptr)

I'm including a few examples of migration in this patch.  I'll do the
rest in a separate patch.

Note that writeNext already has the same trick for the alignment
template parameter.
…de model (#84523)

These should be already handled by other code.

Removing the kernel code model rules right above it cause
bss_pagealigned.ll to fail by using a movabsq to get the address of a
global, haven't figured out where that code is yet.
This patch tests LLDB integration with libsanitizers for ASan.

rdar://111856681
…tatic code model (#84523)"

This reverts commit b4cf63d.

Breaks indirect-branch-tracking-eh2.ll.
Allocatable with cuda device attribute have special semantic for the
allocate statement. In flang the allocate statement is lowered to a
sequence of runtime call initializing the descriptor and then allocating
the descriptor data. This new operation will replace the last runtime
call and abstract all the device memory allocation needed.
The lowering patch will follow.
Previously the MallocNanoZone envvar would be set to 0 on Darwin for the
LLDB shell tests, but this should guarded behind ASan being enabled as
opposed to simply running the test suite behind Darwin. This required
that the LLVM_USE_SANITIZER option be added as an attribute to the lit
config for shell tests.
…8805)

This is currently being implied in RISCVISAInfo.cpp. Make it explicit.

I'm planning to move all extension information to RISCVFeatures.td and
have tablegen create the tables for RISCVISAInfo.cpp. This requires
making the creation of RISCVTargetParserDef.inc in tablegen independent
of RISCVISAInfo.cpp. So we need an accurate extension list for CPUs in
tablegen.
nikic and others added 28 commits April 17, 2024 18:22
…s (#88217)

Change all the cstval_pred_ty based PatternMatch helpers (things like
m_AllOnes and m_Zero) to only allow poison elements inside vector
splats, not undef elements.

Historically, we used to represent non-demanded elements in vectors
using undef. Nowadays, we use poison instead. As such, I believe that
support for undef in vector splats is no longer useful.

At the same time, while poison splat elements are pretty much always
safe to ignore, this is not generally the case for undef elements. We
have existing miscompiles in our tests due to this (see the
masked-merge-*.ll tests changed here) and it's easy to miss such cases
in the future, now that we write tests using poison instead of undef
elements.

I think overall, keeping support for undef elements no longer makes
sense, and we should drop it. Once this is done consistently, I think we
may also consider allowing poison in m_APInt by default, as doing that
change is much less risky than doing the same with undef.

This change involves a substantial amount of test changes. For most
tests, I've just replaced undef with poison, as I don't think there is
value in retaining both. For some tests (where the distinction between
undef and poison is important), I've duplicated tests.
Currently, it is not possible to find back which fun.func is the host
procedure of some internal procedure because the mangling of the
internal procedure does not contain info about the BIND(C) name of the
host.
This info may be useful to ensure dwarf DW_TAG_subprogram of internal
procedures are nested under DW_TAG_subprogram of host procedures for
instance.
…nter size of the target (#88725)

This PR resolves the issue that SPIR-V Backend uses the notion of a
pointer size of the target, most notably, in legalizer code, but
Tablegen instruction selection in SPIR-V Backend doesn't account for a
pointer size of the target. See
llvm/llvm-project#88723 for a detailed
description. There are 3 test cases attached to the PR that reproduced
the issue, when dealing with spirv32-spirv64 differences, and are
working correctly now with this PR.
This PR addresses an issue that may arise when an integer argument size
differs from a machine word size for the target in a call to llvm
intrinsic. The following example demonstrates the issue:

```
@__const.test.arr = private unnamed_addr addrspace(2) constant [3 x i32] [i32 1, i32 2, i32 3]

define spir_func void @test() {
entry:
  %arr = alloca [3 x i32], align 4
  %dest = bitcast ptr %arr to ptr
  call void @llvm.memcpy.p0.p2.i32(ptr align 4 %dest, ptr addrspace(2) align 4 @__const.test.arr, i32 1024, i1 false)
  ret void
}

declare void @llvm.memcpy.p0.p2.i32(ptr nocapture writeonly, ptr addrspace(2) nocapture readonly, i32, i1)
```

Depending on the target this code may work or may fail without this PR
due to the fact that IR Translation step introduces additional `zext`
when type of the 3rd argument of `@llvm.memcpy.p0.p2.i32` differs from
machine word.

This PR addresses the issue by adding type deduction for a newly
inserted G_ZEXT generic opcode.
When a local variable inside a BLOCK construct is used as threadprivate
variable, llvm-flang throws below error:

> error: The THREADPRIVATE directive and the common block or variable in
it must appear in the same declaration section of a scoping unit
This patch introduces a new VPWidenMemoryRecipe base class and distinct
sub-classes to model loads and stores.

This is a first step in an effort to simplify and modularize code
generation for widened loads and stores and enable adding further more
specialized memory recipes.

PR: llvm/llvm-project#87411
Collect the original check lines in a manner that is independent of
where the check lines appear in the file. This is so that we keep
FileCheck variable names stable even when --include-generated-funcs is
used.

Reported-by: Ruiling Song <[email protected]>
This patch updates the definition of `omp.simdloop` to enforce the
restrictions of a wrapper operation. It has been renamed to `omp.simd`,
to better reflect the naming used in the spec. All uses of "simdloop" in
function names have been updated accordingly.

Some changes to Flang lowering and OpenMP to LLVM IR translation are
introduced to prevent the introduction of compilation/test failures. The
eventual long term solution might be different.
This patch removes the LoopControl parsing/printing functions that are
no longer used after transitioning `omp.simdloop` and `omp.taskloop`
into loop wrapper operations.
…egs. NFC

In vxrm.mir we were running RISCVInsertVSETVLI on pseudos that already had
vsetvlis inserted and their AVLs set to $noreg. (This happened to work
since doLocalPostpass got rid of the extra vsetvli)

This removes the vsetvlis from the test and enforces that the only valid
AVLs we work with are either X0 or virtual registers (or $noreg before
emitVSETVLIs), since we don't handle physical registers properly in
doLocalPostpass.
… (#87388)

As we've added new IR elements for the RemoveDIs project,
we need the update_test_checks script to understand them. For the
records themselves this is already done automatically, but their
metadata arguments are not recognized as such due to lacking the
`metadata` prefix, which means they won't be checked by the script. This
patch fixes this by adding a check for all `![0-9]+` patterns as long as
they are not at the start of a line (which avoids matching global
values).
"Till heaven and earth pass, one jot, or one tittle shall not pass of
the law"
…#88494)

Fixes #85084

Whenever an inferior thread stops, lldb-server sends a SIGSTOP to all
other threads in the process to force them to stop as well. If those
threads stop on their own before they get a signal, this SIGSTOP will
remain pending and be delivered the next time the process resumes.

Normally, this is not a problem, because lldb-server will detect this
stale SIGSTOP and resume the process. However, if we detach from the
process while it has these SIGSTOPs pending, they will get immediately
delivered, and the process will remain stopped (most likely forever).

This patch fixes that by sending a SIGCONT just before detaching from
the process. This signal cancels out any pending SIGSTOPs, and ensures
it is able to run after we detach. It does have one somewhat unfortunate
side-effect that in that the process's SIGCONT handler (if it has one)
will get executed spuriously (from the process's POV).

This could be _sometimes_ avoided by tracking which threads got send a
SIGSTOP, and whether those threads stopped due to it. From what I could
tell by observing its behavior, this is what gdb does. I have not tried
to replicate that behavior here because it adds a nontrivial amount of
complexity and the result is still uncertain -- we still need to send a
SIGCONT (and execute the handler) when any thread stops for some other
reason (and leaves our SIGSTOP hanging). Furthermore, since SIGSTOPs
don't stack, it's also possible that our SIGSTOP/SIGCONT combination
will cancel a genuine SIGSTOP being sent to the debugger application (by
someone else), and there is nothing we can do about that. For this
reason I think it's simplest and most predictible to just always send a
SIGCONT when detaching, but if it turns out this is breaking something,
we can consider implementing something more elaborate.

One alternative I did try is to use PTRACE_INTERRUPT to suspend the
threads instead of a SIGSTOP. PTRACE_INTERUPT requires using
PTRACE_SEIZE to attach to the process, which also made this solution
somewhat complicated, but the main problem with that approach is that
PTRACE_INTERRUPT is not considered to be a signal-delivery-stop, which
means it's not possible to resume it while injecting another signal to
the inferior (which some of our tests expect to be able to do). This
limitation could be worked around by forcing the thread into a signal
delivery stop whenever we need to do this, but this additional
complication is what made me think this approach is also not worthwhile.

This patch should fix (at least some of) the problems with
TestConcurrentVFork, but I've also added a dedicated test for checking
that a process keeps running after we detach. Although the problem I'm
fixing here is linux-specific, the core functinoality of not stopping
after a detach should function the same way everywhere.
This patch simplifies the lowering from PFT to MLIR of OpenMP compound
constructs (i.e. combined and composite).

The new approach consists of iteratively processing the outermost leaf
construct of the given combined construct until it cannot be split
further. Both leaf constructs and composite ones have `gen...()`
functions that are called when appropriate.

This approach enables treating a leaf construct the same way regardless
of if it appeared as part of a combined construct, and it also enables
the lowering of composite constructs as a single unit.

Previous corner cases are now handled in a more straightforward way and
comments pointing to the relevant spec section are added. Directive sets
are also completed with missing LOOP related constructs.
…88913)

This commit explicitly specifies the matching mode (C library function,
any non-method function, or C++ method) for the `CallDescription`s
constructed in the iterator/container checkers.

This change won't cause major functional changes, but isn't NFC because
it ensures that e.g. call descriptions for a non-method function won't
accidentally match a method that has the same name.

Separate commits will perform (or have already performed) this change in
other checkers. My goal is to ensure that the call description mode is
always explicitly specified and eliminate (or strongly restrict) the
vague "may be either a method or a simple function" mode that's the
current default.

I'm handling the iterator checkers in this separate commit because
they're infamously complex; but I don't expect any trouble because this
transition doesn't interact with the "central" logic of iterator
handling.
Fix the summary of intNEQValue.
Also add tests for those, and add a few missing requirements to testing
iterators in the test suite.
Greedy rewrite driver has options to control the number of rewrites
applies. Expose those via the corresponding transform op.
…tail duplicating blocks (#78582)

Fixes #78578.

Duplicating a BB which has both multiple predecessors and successors
will result in a complex CFG and also may cause huge amount of PHI
nodes. See
llvm/llvm-project#78578 (comment)
for a detailed description of the limit.
After 281d716, llvm generates 32-bit
relocations, which overflow when we load these objects into high memory.
Interestingly, setting the code model to "large" does not help here
(perhaps it is the default?). I'm not completely sure that this is the
right thing to do, but it doesn't seem to cause any ill effects. I'll
follow up with the author of that patch about the expected behavior
here.
…. (#85272)

Correct missing cases in a switch that result in @llvm.vp.fma.v4f32
getting lowered to a constrained fma intrinsic. Vector predicated
lowering to contrained intrinsics is not supported currently, and
there's no consensus on the path forward. We certainly shouldn't be
introducing constrained intrinsics into a function that isn't strictfp.

Problem found with D146845.
The link to the Heterogeneous-race-free Memory Models ASPLOS'14 paper by
Hower et al. pointed to a bogus website, probably because the domain
ownership has changed.
This patch updates it to a version hosted on research.cs.wisc.edu.
This functionality is available in C++, make it available in Python
directly to operate on transform modules.
@kaz7 kaz7 merged commit 3d2dfee into develop Apr 30, 2024
12 checks passed
@kaz7 kaz7 deleted the feature/merge-upstream-20240417 branch April 30, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.