Feature/merge upstream 20240417 #312

kaz7 · 2024-04-30T09:15:29Z

Merge upstream/main up to 2024/4/17.

This patch has been passed the iternal regression tests.

In #88317, the clang resource headers was converted to an interface library. Update LLDB and fix the Xcode standalone build. Thanks Evan for the help!

llvm/llvm-project#79940 put calls to recomputeLiveIns into a loop, to repeatedly call the function until the computation converges. However, this repeats a lot of code. This changes moves the loop into a function to simplify the handling. Note that this changes the order in which recomputeLiveIns is called. For example, ``` bool anyChange = false; do { anyChange = recomputeLiveIns(*ExitMBB) || recomputeLiveIns(*LoopMBB); } while (anyChange); ``` only begins to recompute the live-ins for LoopMBB after the computation for ExitMBB has converged. With this change, all basic blocks have a recomputation of the live-ins for each loop iteration. This can result in less or more calls, depending on the situation.

Test, as expected, fails with Asan on system with 5lvl page tables. Disabling the test to migrate buildbot.

…ommonInvocationForModuleBuild` into its own function (#88447) The new function is about clearing out benign codegen options and can be applied for PCH invocations as well.

Links to `llvm.mlir.global` and `llvm.mlir.addressof` in the ["Globals" section of LLVM dialect documentation](https://mlir.llvm.org/docs/Dialects/LLVM/#globals) are broken.

…s (#86472) The callee should preserve rbx according to the calling convention, but it is not in the test case `ExecutionEngine/JITLink/x86-64/ELF_vtune.s`. Not preserving the rbx register may result in some random error to the caller function. This patch adds the missing command to preserve the rbx.

Remove the fold working on abs in SPF representation now that we canonicalize SPF to intrinsics. This is not strictly NFC because the SPF fold might fire for non-canonical IR due to multi-use, but given the lack of test coverage, I assume this is not important.

This expansion is directly inspired by the analogous code in the x86 backend for LEA. shXadd and (this sub-case of) LEA are largely equivalent. This is an alternative to llvm/llvm-project#87105. This expansion is also supported via the decomposeMulByConstant callback, but restricted because of interactions with other combines since that code runs before legalization. As discussed in the other review, my original plan had been to support post legalization expansion through the same interface, but that ended up being more complicated than seems justified. Instead, lets go ahead and do the general expansion post-legalize. Other targets use the combine approach, and matching that structure makes it easier for us to adapt ideas from other targets to RISCV.

… extract_subvector(y,c2-c1) (#87925)" This reverts commit 8c0f52e. Reverting to green, reproducer attached in the PR/revision comments.

https://reviews.llvm.org/D54188 marked "alias" targets as used in C to fix -Wunused false positives. This patch extends the approach to handle mangled names to support global scope names in C++ and the `overloadable` attribute in C. In addition, we mark ifunc targets as used to fix #63957. While our approach has false negatives for namespace scope names, the majority of alias/ifunc C++ uses (global scope with no overloads) are handled. Note: The following function with internal linkage but C language linkage type is mangled in Clang but not in GCC. This inconsistency makes alias/ifunc difficult to use in C++ with portability (#88593). ``` extern "C" { static void f0() {} // GCC: void g0() __attribute__((alias("_ZL2f0v"))); // Clang: void g0() __attribute__((alias("f0"))); } ``` Pull Request: llvm/llvm-project#87130

…tors (#88643) When build with option -msve-vector-bits=512, the return vaule of Subtarget->getMinSVEVectorSizeInBits() is 512; While the MinElts is still 4 for <vscale x 4 x double> in getNumInterleavedAccesses, so it creates invalid llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited. Unlikely, the related custom spilting is not supported now. Fix llvm/llvm-project#88247

…e aligned (#88435) Increase alignment of `llvm.threadlocal.address` if the pointed to global has higher alignment.

The parameter of `findDebugNamesOffsets` has been renamed to `EndOfHeaderOffset` in #88064 to make it clear it is a section offset instead of an offset relative to the current name index. Rename the call site variable as well.

Use the Val type to estimate the instruction cost for ICmp.

…in op (#87163) These previously were added in the C++ API in 778cf54, but without updating the enum in the C API or mapping functions. Corresponding tests for all current atomicrmw bin ops have been added as well.

… (#88552) For experimental.cttz.elts, we can use a vfirst instruction, but we need to correct the result if input vector can be 0. cttz.elts returns the vector length while vfirst returns -1.

When inserting truncs during IV widening, mark the trunc as either nuw or nsw depending on whether zext or sext widening was used. For non-negative IVs both nuw and nsw apply.

…ass member function (#85198) Fix llvm/llvm-project#84020 Skip checking implicit object parameter in the context of `RequiresExprBodyDecl`. Co-authored-by: huqizhi <[email protected]>

Without this patch, you would typically use readNext as: readNext<uint32_t, llvm::endianness::little, unaligned>(Ptr) which is quite mouthful. Since most serialization/deserialization operations are unaligned accesses, this patch makes the alignment template parameter default to unaligned, allowing us to say: readNext<uint32_t, llvm::endianness::little>(Ptr) I'm including a few examples of migration in this patch. I'll do the rest in a separate patch. Note that writeNext already has the same trick for the alignment template parameter.

…de model (#84523) These should be already handled by other code. Removing the kernel code model rules right above it cause bss_pagealigned.ll to fail by using a movabsq to get the address of a global, haven't figured out where that code is yet.

This patch tests LLDB integration with libsanitizers for ASan. rdar://111856681

…tatic code model (#84523)" This reverts commit b4cf63d. Breaks indirect-branch-tracking-eh2.ll.

Allocatable with cuda device attribute have special semantic for the allocate statement. In flang the allocate statement is lowered to a sequence of runtime call initializing the descriptor and then allocating the descriptor data. This new operation will replace the last runtime call and abstract all the device memory allocation needed. The lowering patch will follow.

Previously the MallocNanoZone envvar would be set to 0 on Darwin for the LLDB shell tests, but this should guarded behind ASan being enabled as opposed to simply running the test suite behind Darwin. This required that the LLVM_USE_SANITIZER option be added as an attribute to the lit config for shell tests.

…8805) This is currently being implied in RISCVISAInfo.cpp. Make it explicit. I'm planning to move all extension information to RISCVFeatures.td and have tablegen create the tables for RISCVISAInfo.cpp. This requires making the creation of RISCVTargetParserDef.inc in tablegen independent of RISCVISAInfo.cpp. So we need an accurate extension list for CPUs in tablegen.

…s (#88217) Change all the cstval_pred_ty based PatternMatch helpers (things like m_AllOnes and m_Zero) to only allow poison elements inside vector splats, not undef elements. Historically, we used to represent non-demanded elements in vectors using undef. Nowadays, we use poison instead. As such, I believe that support for undef in vector splats is no longer useful. At the same time, while poison splat elements are pretty much always safe to ignore, this is not generally the case for undef elements. We have existing miscompiles in our tests due to this (see the masked-merge-*.ll tests changed here) and it's easy to miss such cases in the future, now that we write tests using poison instead of undef elements. I think overall, keeping support for undef elements no longer makes sense, and we should drop it. Once this is done consistently, I think we may also consider allowing poison in m_APInt by default, as doing that change is much less risky than doing the same with undef. This change involves a substantial amount of test changes. For most tests, I've just replaced undef with poison, as I don't think there is value in retaining both. For some tests (where the distinction between undef and poison is important), I've duplicated tests.

Currently, it is not possible to find back which fun.func is the host procedure of some internal procedure because the mangling of the internal procedure does not contain info about the BIND(C) name of the host. This info may be useful to ensure dwarf DW_TAG_subprogram of internal procedures are nested under DW_TAG_subprogram of host procedures for instance.

…nter size of the target (#88725) This PR resolves the issue that SPIR-V Backend uses the notion of a pointer size of the target, most notably, in legalizer code, but Tablegen instruction selection in SPIR-V Backend doesn't account for a pointer size of the target. See llvm/llvm-project#88723 for a detailed description. There are 3 test cases attached to the PR that reproduced the issue, when dealing with spirv32-spirv64 differences, and are working correctly now with this PR.

@test

This PR addresses an issue that may arise when an integer argument size differs from a machine word size for the target in a call to llvm intrinsic. The following example demonstrates the issue: ``` @__const.test.arr = private unnamed_addr addrspace(2) constant [3 x i32] [i32 1, i32 2, i32 3] define spir_func void @test() { entry: %arr = alloca [3 x i32], align 4 %dest = bitcast ptr %arr to ptr call void @llvm.memcpy.p0.p2.i32(ptr align 4 %dest, ptr addrspace(2) align 4 @__const.test.arr, i32 1024, i1 false) ret void } declare void @llvm.memcpy.p0.p2.i32(ptr nocapture writeonly, ptr addrspace(2) nocapture readonly, i32, i1) ``` Depending on the target this code may work or may fail without this PR due to the fact that IR Translation step introduces additional `zext` when type of the 3rd argument of `@llvm.memcpy.p0.p2.i32` differs from machine word. This PR addresses the issue by adding type deduction for a newly inserted G_ZEXT generic opcode.

When a local variable inside a BLOCK construct is used as threadprivate variable, llvm-flang throws below error: > error: The THREADPRIVATE directive and the common block or variable in it must appear in the same declaration section of a scoping unit

This patch introduces a new VPWidenMemoryRecipe base class and distinct sub-classes to model loads and stores. This is a first step in an effort to simplify and modularize code generation for widened loads and stores and enable adding further more specialized memory recipes. PR: llvm/llvm-project#87411

Collect the original check lines in a manner that is independent of where the check lines appear in the file. This is so that we keep FileCheck variable names stable even when --include-generated-funcs is used. Reported-by: Ruiling Song <[email protected]>

This patch updates the definition of `omp.simdloop` to enforce the restrictions of a wrapper operation. It has been renamed to `omp.simd`, to better reflect the naming used in the spec. All uses of "simdloop" in function names have been updated accordingly. Some changes to Flang lowering and OpenMP to LLVM IR translation are introduced to prevent the introduction of compilation/test failures. The eventual long term solution might be different.

This patch removes the LoopControl parsing/printing functions that are no longer used after transitioning `omp.simdloop` and `omp.taskloop` into loop wrapper operations.

…egs. NFC In vxrm.mir we were running RISCVInsertVSETVLI on pseudos that already had vsetvlis inserted and their AVLs set to $noreg. (This happened to work since doLocalPostpass got rid of the extra vsetvli) This removes the vsetvlis from the test and enforces that the only valid AVLs we work with are either X0 or virtual registers (or $noreg before emitVSETVLIs), since we don't handle physical registers properly in doLocalPostpass.

… (#87388) As we've added new IR elements for the RemoveDIs project, we need the update_test_checks script to understand them. For the records themselves this is already done automatically, but their metadata arguments are not recognized as such due to lacking the `metadata` prefix, which means they won't be checked by the script. This patch fixes this by adding a check for all `![0-9]+` patterns as long as they are not at the start of a line (which avoids matching global values).

"Till heaven and earth pass, one jot, or one tittle shall not pass of the law"

…#88494) Fixes #85084 Whenever an inferior thread stops, lldb-server sends a SIGSTOP to all other threads in the process to force them to stop as well. If those threads stop on their own before they get a signal, this SIGSTOP will remain pending and be delivered the next time the process resumes. Normally, this is not a problem, because lldb-server will detect this stale SIGSTOP and resume the process. However, if we detach from the process while it has these SIGSTOPs pending, they will get immediately delivered, and the process will remain stopped (most likely forever). This patch fixes that by sending a SIGCONT just before detaching from the process. This signal cancels out any pending SIGSTOPs, and ensures it is able to run after we detach. It does have one somewhat unfortunate side-effect that in that the process's SIGCONT handler (if it has one) will get executed spuriously (from the process's POV). This could be _sometimes_ avoided by tracking which threads got send a SIGSTOP, and whether those threads stopped due to it. From what I could tell by observing its behavior, this is what gdb does. I have not tried to replicate that behavior here because it adds a nontrivial amount of complexity and the result is still uncertain -- we still need to send a SIGCONT (and execute the handler) when any thread stops for some other reason (and leaves our SIGSTOP hanging). Furthermore, since SIGSTOPs don't stack, it's also possible that our SIGSTOP/SIGCONT combination will cancel a genuine SIGSTOP being sent to the debugger application (by someone else), and there is nothing we can do about that. For this reason I think it's simplest and most predictible to just always send a SIGCONT when detaching, but if it turns out this is breaking something, we can consider implementing something more elaborate. One alternative I did try is to use PTRACE_INTERRUPT to suspend the threads instead of a SIGSTOP. PTRACE_INTERUPT requires using PTRACE_SEIZE to attach to the process, which also made this solution somewhat complicated, but the main problem with that approach is that PTRACE_INTERRUPT is not considered to be a signal-delivery-stop, which means it's not possible to resume it while injecting another signal to the inferior (which some of our tests expect to be able to do). This limitation could be worked around by forcing the thread into a signal delivery stop whenever we need to do this, but this additional complication is what made me think this approach is also not worthwhile. This patch should fix (at least some of) the problems with TestConcurrentVFork, but I've also added a dedicated test for checking that a process keeps running after we detach. Although the problem I'm fixing here is linux-specific, the core functinoality of not stopping after a detach should function the same way everywhere.

This patch simplifies the lowering from PFT to MLIR of OpenMP compound constructs (i.e. combined and composite). The new approach consists of iteratively processing the outermost leaf construct of the given combined construct until it cannot be split further. Both leaf constructs and composite ones have `gen...()` functions that are called when appropriate. This approach enables treating a leaf construct the same way regardless of if it appeared as part of a combined construct, and it also enables the lowering of composite constructs as a single unit. Previous corner cases are now handled in a more straightforward way and comments pointing to the relevant spec section are added. Directive sets are also completed with missing LOOP related constructs.

…88913) This commit explicitly specifies the matching mode (C library function, any non-method function, or C++ method) for the `CallDescription`s constructed in the iterator/container checkers. This change won't cause major functional changes, but isn't NFC because it ensures that e.g. call descriptions for a non-method function won't accidentally match a method that has the same name. Separate commits will perform (or have already performed) this change in other checkers. My goal is to ensure that the call description mode is always explicitly specified and eliminate (or strongly restrict) the vague "may be either a method or a simple function" mode that's the current default. I'm handling the iterator checkers in this separate commit because they're infamously complex; but I don't expect any trouble because this transition doesn't interact with the "central" logic of iterator handling.

Fix the summary of intNEQValue.

Also add tests for those, and add a few missing requirements to testing iterators in the test suite.

…k prefixes

Greedy rewrite driver has options to control the number of rewrites applies. Expose those via the corresponding transform op.

…tail duplicating blocks (#78582) Fixes #78578. Duplicating a BB which has both multiple predecessors and successors will result in a complex CFG and also may cause huge amount of PHI nodes. See llvm/llvm-project#78578 (comment) for a detailed description of the limit.

After 281d716, llvm generates 32-bit relocations, which overflow when we load these objects into high memory. Interestingly, setting the code model to "large" does not help here (perhaps it is the default?). I'm not completely sure that this is the right thing to do, but it doesn't seem to cause any ill effects. I'll follow up with the author of that patch about the expected behavior here.

…. (#85272) Correct missing cases in a switch that result in @llvm.vp.fma.v4f32 getting lowered to a constrained fma intrinsic. Vector predicated lowering to contrained intrinsics is not supported currently, and there's no consensus on the path forward. We certainly shouldn't be introducing constrained intrinsics into a function that isn't strictfp. Problem found with D146845.

The link to the Heterogeneous-race-free Memory Models ASPLOS'14 paper by Hower et al. pointed to a bogus website, probably because the domain ownership has changed. This patch updates it to a version hosted on research.cs.wisc.edu.

This functionality is available in C++, make it available in Python directly to operate on transform modules.

…merge-upstream-20240417

nico and others added 30 commits April 15, 2024 17:01

[gn] port e356f68 more

c303945

[lldb] Fix the standalone Xcode build after #88317

a855eea

In #88317, the clang resource headers was converted to an interface library. Update LLDB and fix the Xcode standalone build. Thanks Evan for the help!

[test][sanitizer] Compile .c file as C

67571ff

[test][sanitizer] Temporarily disable test

a1ed652

Test, as expected, fails with Asan on system with 5lvl page tables. Disabling the test to migrate buildbot.

[clang/DependencyScanning/ModuleDepCollector] Refactor part of `makeC…

6331024

…ommonInvocationForModuleBuild` into its own function (#88447) The new function is about clearing out benign codegen options and can be applied for PCH invocations as well.

[docs][mlir] Fix broken links in 'llvm' dialects. (#88704)

6d23463

Links to `llvm.mlir.global` and `llvm.mlir.addressof` in the ["Globals" section of LLVM dialect documentation](https://mlir.llvm.org/docs/Dialects/LLVM/#globals) are broken.

[gn] port 8a7846f (C++23 for libcxx, libcxxabi)

206acf7

Work around test failure due to new aslr default

466017c

Revert "[DAG] Fold extract_subvector(insert_subvector(x,y,c1),c2) -->…

40bbdb6

… extract_subvector(y,c2-c1) (#87925)" This reverts commit 8c0f52e. Reverting to green, reproducer attached in the PR/revision comments.

[gn] port 311ff39 more

694c444

InstCombine: Increase threadlocal.address alignment if pointee is mor…

d23a850

…e aligned (#88435) Increase alignment of `llvm.threadlocal.address` if the pointed to global has higher alignment.

[DWARF] Clarify a variable name. NFC (#88814)

2e26ee9

The parameter of `findDebugNamesOffsets` has been renamed to `EndOfHeaderOffset` in #88064 to make it clear it is a section offset instead of an offset relative to the current name index. Rename the call site variable as well.

[RISCV][TTI] Scale the cost of ICmp with LMUL (#88235)

f3a8112

Use the Val type to estimate the instruction cost for ICmp.

[RISCV] Provide a more efficient lowering for experimental.cttz.elts.…

5b9af38

… (#88552) For experimental.cttz.elts, we can use a vfirst instruction, but we need to correct the result if input vector can be 0. cttz.elts returns the vector length while vfirst returns -1.

[IndVars] Mark truncs as nuw/nsw (#88686)

4b22a92

When inserting truncs during IV widening, mark the trunc as either nuw or nsw depending on whether zext or sext widening was used. For non-negative IVs both nuw and nsw apply.

[Clang][Sema] Fix issue on requires expression with templated base cl…

5f68072

…ass member function (#85198) Fix llvm/llvm-project#84020 Skip checking implicit object parameter in the context of `RequiresExprBodyDecl`. Co-authored-by: huqizhi <[email protected]>

Add asan tests for libsanitizers. (#88349)

82f479b

This patch tests LLDB integration with libsanitizers for ASan. rdar://111856681

Revert "[X86] Remove obsolete tablegen rules for near data in small s…

00ae4b7

…tatic code model (#84523)" This reverts commit b4cf63d. Breaks indirect-branch-tracking-eh2.ll.

nikic and others added 28 commits April 17, 2024 18:22

[LV][NFC] Remove the declaration of function fixReduction. (#88491)

cbe148b

[MLIR][OpenMP] NFC: Remove LoopControl parsing/printing code (#88909)

16b0be6

This patch removes the LoopControl parsing/printing functions that are no longer used after transitioning `omp.simdloop` and `omp.taskloop` into loop wrapper operations.

[clang-tidy NFC] Fix a typo in docs for sizeof-expression (#88912)

792d437

"Till heaven and earth pass, one jot, or one tittle shall not pass of the law"

[mlir] fix intNEQValue summary (#89029)

631c5e8

Fix the summary of intNEQValue.

[libc++] Add missing iterator requirement checks in the PSTL (#88127)

d57907d

Also add tests for those, and add a few missing requirements to testing iterators in the test suite.

[X86] vector-shuffle-combining-sse41.ll - add missing AVX1/2/512 chec…

6c78530

…k prefixes

[mlir] transform.apply_patterns support more config options (#88484)

37b26bf

Greedy rewrite driver has options to control the number of rewrites applies. Expose those via the corresponding transform op.

[RISCV] Fix clang-tidy warning about else after return. NFC

4536ad4

[Inline] Regenerate inline-switch-default-2.ll (NFC)

971ec1f

[mlir] expose transform dialect symbol merge to python (#87690)

73140da

This functionality is available in C++, make it available in Python directly to operate on transform modules.

Merge commit '73140daebbf522dbb14dc4b2f3c67dc0aa1a62dd' into feature/…

3b3e047

…merge-upstream-20240417

kaz7 merged commit 3d2dfee into develop Apr 30, 2024
12 checks passed

kaz7 deleted the feature/merge-upstream-20240417 branch April 30, 2024 09:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/merge upstream 20240417 #312

Feature/merge upstream 20240417 #312

kaz7 commented Apr 30, 2024

Feature/merge upstream 20240417 #312

Feature/merge upstream 20240417 #312

Conversation

kaz7 commented Apr 30, 2024