-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/merge upstream 20231019 #287
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…oadcastable shuffle mask
This patch adds the CodeGen changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. This change relaxes restrictions on what gets emitted on the device path, when compiling in `hipstdpar` mode: 1. Unless a function is explicitly marked `__host__`, it will get emitted, whereas before only `__device__` and `__global__` functions would be emitted; 2. Unsupported builtins are ignored as opposed to being marked as an error, as the decision on their validity is deferred to the `hipstdpar` specific code selection pass; 3. We add a `hipstdpar` specific pass to the opt pipeline, independent of optimisation level: - When compiling for the host, iff the user requested it via the `--hipstdpar-interpose-alloc` flag, we add a pass which replaces canonical allocation / deallocation functions with accelerator aware equivalents. A test to validate that unannotated functions get correctly emitted is added as well. Reviewed by: yaxunl, efriedma Differential Revision: https://reviews.llvm.org/D155850
This PR enhances `BasicPtxBuilder` to support predicates in PTX code generation. The `BasicPtxBuilder` interface was initially introduced for generating PTX code automatically for Ops that aren't supported by LLVM core. Predicates, which are typically not supported in LLVM core, are now supported using the same mechanism. In PTX programming, instructions can be guarded by predicates as shown below:. Here `@p` is a predicate register and guard the execution of the instruction. ``` @p ptx.code op1, op2, op3 ``` This PR introduces the `getPredicate` function in the `BasicPtxBuilder` interface to set an optional predicate. When a predicate is provided, the instruction is generated with predicate and guarded, otherwise, predicate is not genearted. Note that the predicate value must always appear as the last argument on the Op definition. Additionally, this PR implements predicate usage for the following ops: - mbarrier.init - mbarrier.init.shared - mbarrier.arrive.expect_tx - mbarrier.arrive.expect_tx.shared - cp.async.bulk.tensor.shared.cluster.global - cp.async.bulk.tensor.global.shared.cta See for more detail in PTX programing model https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ptx-instructions
The #68728 significantly simplified the accumulator matrix type, making it easier to work with the nvgpu dialect without worrying about the number of required structs, as this information is abstracted away in the nvgpu-to-nvvm transformation. However, we forgot packing the structs after initialization, causing the accumulator matrix to hold undefined values, which is wrong. This PR addresses that.
Fixes #69291. This patch improve the logic handling different patterns to avoid mixing these pattern.
This PR adds `prefetch.tensormap` Op. It brings the cache line containing the given tma descriptor for subsequent use by the cp.async.bulk.tensor instruction. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-prefetch-prefetchu
Add a test that performs warpgroup matrix multiply 128x128x64. The test uses three Ops to do that.
We cannot rely on the address of StringAttr being the same if the stored string is the same.
…s. (#69243) isGDS() and isTFE() need special treatment, because they may be both named-bit and token operands. Part of #62629.
…(#69326) Thsi removes some of the machinery added by D85268, which was unused since D87719 changed all buffer atomic intrinsics to return a value.
It's supposed to return null when an unknown target id is passed.
Currently clang diagnoses the following code: (https://godbolt.org/z/s8zK3E5P5) but nvcc does not. ` struct A { constexpr A(){} }; struct B { A a; int b; }; template<typename T> __global__ void kernel( ) { __shared__ B x; } ` Clang generates an implicit trivial ctor for struct B, which should be allowed for initializing a shared variable. However, the body of the ctor is defined only if the template kernel is instantiated. Clang checks the initialization of variable in non-instantiated templates, where it cannot find the body of the ctor, therefore diagnoses it. This patch skips the check for non-instantiated templates.
As described in: ARM-software/acle#257 Patch by : David Sherwood <[email protected]> Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D150961
…g-info ...behind an experimental CMAKE option that's off by default. This patch adds a new ilist-iterator-like class that can carry two extra bits as well as the usual node pointer. This is part of the project to remove debug-intrinsics from LLVM: see the rationale here [0], they're needed to signal whether a "position" in a BasicBlock includes any debug-info before or after the iterator. This entirely duplicates ilist_iterator, attempting re-use showed it to be a false economy. It's enable-able through the existing ilist_node options interface, hence a few sites where the instruction-list type needs to be updated. The actual main feature, the extra bits in the class, aren't part of the class unless the cmake flag is given: this is because there's a compile-time cost associated with it, and I'd like to get everything in-tree but off-by-default so that we can do proper comparisons. Nothing actually makes use of this yet, but will do soon, see the Phab patch stack. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D153777
Fixes build failures for cases where there's no additional visibility / linkage spec. Differential Revision: https://reviews.llvm.org/D155850
Modify ConstantBounds' methods that handle subscripts and bounds to avoid integer overflows. This is needed to properly handle arrays with the maximum possible upper bound (INT64_MAX).
This updates the documentation on these inline asm constraint codes to match reality. Context: llvm/llvm-project#68818 (comment) Note: dropping also the `'o'` from the docs because I can't find any mention of it in X86ISelLowering.cpp.
Similar to FP16 but we don't have native scalar instruction support, so limit it to vector types only. Fixes #68149
…str (#68908) The svldr_vnum_za and svstr_vnum_za builtins/intrinsics currently require that the vnum argument be an immediate, but since vnum is used to modify the base register via a mul and add, that restriction is not necessary. This patch removes that restriction.
…67727) Ensure that the f18Addendum flag is preserved in AllocatableApplyMold(), that raw().type is reinitialized in AllocatableDeallocatePolymorphic(), and that the implementations of SameTypeAs() and ExtendsTypeOf() handle unallocated unlimited polymorphic arguments correctly.
…69129) Mark any registers as CustomReg and any stack slot as CustomMem. This allows us to more directly emit the register or memory access for the high part. Previously we needed a memory access if the low register was X17 and we assumed the stack offset was 0. If the low part wasn't X17, we assumed the high register was the next register after the low register. This is another part of supporting FP arguments with GISel.
…plicit N1/N2 ops First step towards adding unary/ternary fp ops handling, and not just binops
…314) replaceValuesPerBlockEntry() only handled simple and coerced load values, however the load may also be referenced by a select value. Additionally, I suspect that the previous code might have been incorrect if a load had an offset, as it always constructed the AvailableValue from scratch. Fixes llvm/llvm-project#69301.
The keyword is intended for debugging purpose. It prints a message to stderr. This patch is based on code originally written by Adam Nemet, and on the feedback received by the reviewers in https://reviews.llvm.org/D157492.
This patch fixes: compiler-rt/lib/builtins/cpu_model.c:590:5: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] by adding a missing "break;".
* `dump`, added in llvm/llvm-project#68793 * `!repr`, added in llvm/llvm-project#68716
…#68897) * `dump`, added in llvm/llvm-project#68793 * `!repr`, added in llvm/llvm-project#68716 The keyword `assert` was missing, so I have added that too.
…d x, c)) with Zicond. (#69563) It's only beneficial when cond is setcc with integer equality condition code. For other case, it has same instruction count as the original.
This PR adds `nvvm.stmatrix` Op to NVVM dialect. The Op collectively store one or more matrices across all threads in a warp to the given address location in shared memory.
Add tests with argmem variations.
…tor (#69010) Adds a new `__builtin_vectorelements()` function which returns the number of elements for a given vector either at compile-time for fixed-sized vectors, e.g., created via `__attribute__((vector_size(N)))` or at runtime via a call to `@llvm.vscale.i32()` for scalable vectors, e.g., SVE or RISCV V. The new builtin follows a similar path as `sizeof()`, as it essentially does the same thing but for the number of elements in vector instead of the number of bytes. This allows us to re-use a lot of the existing logic to handle types etc. A small side addition is `Type::isSizelessVectorType()`, which we need to distinguish between sizeless vectors (SVE, RISCV V) and sizeless types (WASM). This is the [corresponding discussion](https://discourse.llvm.org/t/new-builtin-function-to-get-number-of-lanes-in-simd-vectors/73911).
…s. (#69329) A recent commit (#69190) broke the bazel builds. Turns out that Bazel uses symlinks for providing the test files, which the path expansion of the module loading mechanism did not handle correctly. This PR fixes that. It also reorganizes the tests better: It puts all `.mlir` files that are included by some other test into a common `include` folder. This greatly simplifies the definition of the dependencies between the different `.mlir` files in Bazel's `BUILD` file. The commit also adds a comment to all included files why these aren't tested themselves direclty and uses the `%{fs-sep}` expansion for paths more consistently. Finally, it uncomments all but one of the tests excluded in Bazel because they seem to run now. (The remaining one includes a file that it itself a test, so it would have to live *in* and *outside* of the `include` folder.)
…68962) The _mm_cmpistri instruction can be used to quickly parse identifiers. With this patch activated, clang pre-processes <iostream> 1.8% faster, and sqlite3.c amalgametion 1.5% faster, based on time measurements and number of executed instructions as measured by valgrind. The introduction of an extra helper function in the regular case has no impact on performance, see https://llvm-compile-time-tracker.com/compare.php?from=30240e428f0ec7d4a6d1b84f9f807ce12b46cfd1&to=12bcb016cde4579ca7b75397762098c03eb4f264&stat=instructions:u --------- Co-authored-by: serge-sans-paille <[email protected]>
…ize tests. (#69329)" This reverts commit f681225. That commit changed the organization of the tests of the transform dialect interpreter but did not take into account some tests that were added in the meantime.
As described in: ARM-software/acle#257 Patch by : Sander de Smalen<[email protected]> Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D151199
…nize tests. (#69329)" This reverts commit c122b97 but fixes tests that were added between submitting #69329 for review and landing it for the first time.
Building helloworld.c currently errors with "undefined symbol: __llvm_libc_syscall" See: llvm/llvm-project#67032
…variables" This reverts commit 3353f7d. Fixed test bug (unspecified order of arg evaluation)
…fault_mem_order REQUIRES clause This patch creates the `OmpRewriteMutator` pass that runs at the end of `RewriteParseTree()`. This pass is intended to make OpenMP-specific mutations to the PFT after name resolution. In the case of the `atomic_default_mem_order` clause of the REQUIRES directive, name resolution results in populating global symbols with information about the REQUIRES clauses that apply to that scope. The new rewrite pass is then able to use this information in order to explicitly set the memory order of ATOMIC constructs for which that is not already specified. Given that this rewrite happens before semantics checks, the check of the order in which ATOMIC constructs without explicit memory order and REQUIRES directives with `atomic_default_mem_order` appear is moved earlier into the rewrite pass. Otherwise, these problems would not be caught by semantics checks, since the PFT would be modified by that stage. This is patch 4/5 of a series splitting D149337 to simplify review. Depends on D157983. Differential Revision: https://reviews.llvm.org/D158096
To fix that ticket we only needed to address the V_LSHLREV_B16 case, but I did it for all insts just in case. Fixes #66899
Small fix for failing tests after merge of #69010. The tests need `REQUIRES` to ensure that the correct headers are available. I've also added a generic x86 build which does not need headers, so there is at least one run per test.
When the FPU was selected with "+(no)fp(.dp)" extensions in "-march" or "-mcpu" options, the FPU used for multilib selection was still the default one for given architecture or CPU.
We already save the information about signedness ourselves.
As described in: ARM-software/acle#257 Patch by : David Sherwood <[email protected]> Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D151307
Similar to what we already do for add/sub + saturation variants. Scalar support will be added in a future patch covering the other variants at the same time. Alive2: https://alive2.llvm.org/ce/z/rBDrNE Fixes #69080
…9539) The immediate legality checks are now embedded into the isOperandLegal(). It is not needed to check it again.
…merge-upstream-20231019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merge upstream's main up to 2023/10/19.
This passed the internal regression tests already.