Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alpha support for SVE2.1 #257

Merged
merged 4 commits into from
Apr 12, 2024
Merged

Conversation

CarolineConcatto
Copy link
Contributor

This patch adds new intrinsics and types for supporting SVE2.1. This patch depends on Pull-Request#217
(#217),
because some intrinsic in this specification are also in Pull-Request#217.

Depends on: #217


name: Pull request
about: Technical issues, document format problems, bugs in scripts or feature proposal.


Thank you for submitting a pull request!

If this PR is about a bugfix:

Please use the bugfix label and make sure to go through the checklist below.

If this PR is about a proposal:

We are looking forward to evaluate your proposal, and if possible to
make it part of the Arm C Language Extension (ACLE) specifications.

We would like to encourage you reading through the contribution
guidelines
, in particular the section on submitting
a proposal
.

Please use the proposal label.

As for any pull request, please make sure to go through the below
checklist.

Checklist: (mark with X those which apply)

  • If an issue reporting the bug exists, I have mentioned it in the
    PR (do not bother creating the issue if all you want to do is
    fixing the bug yourself).
  • I have added/updated the SPDX-FileCopyrightText lines on top
    of any file I have edited. Format is SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>
    (Please update existing copyright lines if applicable. You can
    specify year ranges with hyphen , as in 2017-2019, and use
    commas to separate gaps, as in 2018-2020, 2022).
  • I have updated the Copyright section of the sources of the
    specification I have edited (this will show up in the text
    rendered in the PDF and other output format supported). The
    format is the same described in the previous item.
  • I have run the CI scripts (if applicable, as they might be
    tricky to set up on non-*nix machines). The sequence can be
    found in the contribution
    guidelines
    . Don't
    worry if you cannot run these scripts on your machine, your
    patch will be automatically checked in the Actions of the pull
    request.
  • I have added an item that describes the changes I have
    introduced in this PR in the section Changes for next
    release
    of the section Change Control/Document history
    of the document. Create Changes for next release if it does
    not exist. Notice that changes that are not modifying the
    content and rendering of the specifications (both HTML and PDF)
    do not need to be listed.
  • [ X] When modifying content and/or its rendering, I have checked the
    correctness of the result in the PDF output (please refer to the
    instructions on how to build the PDFs
    locally
    ).
  • [ X] The variable draftversion is set to true in the YAML header
    of the sources of the specifications I have modified.
  • Please DO NOT add my GitHub profile to the list of contributors
    in the README page of the project.

CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request May 31, 2023
In this patch it is used for the prototype:
  * svptrue_c8 (and _c16/_c32/_c64)

 As described in: ARM-software/acle#257

Patch by: Sander de Smalen <[email protected]>

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D150953
PietroGhg added a commit to PietroGhg/llvm that referenced this pull request Jun 12, 2023
commit 1496c57722c7db8db7e582b582317e15e719ceb0
Merge: f28ae00bf6a3 074276b9ae76
Author: ns_tester <[email protected]>
Date:   Wed Jun 7 22:32:59 2023 -0700

    LLVM and SPIRV-LLVM-Translator pulldown (WW22)

    LLVM: llvm/llvm-project@40c26ecSPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@c2ff406

commit f28ae00bf6a3dc946194e6f8b543a115fe241c20
Author: Nick Sarnie <[email protected]>
Date:   Wed Jun 7 23:47:13 2023 -0400

    [ESIMD] More support for 64-bit offsets with accessors in stateless mode (#9591)

    This adds support for 64-bit offsets with accessors in stateless mode
    for the remaining APIs. Please let me know if I missed any.

    Today, all of the APIs convert to 32-bit offsets with no error if passed
    a 64-bit offset, except for vector offset versions of `lsc_gather`,
    `lsc_scatter`, and `lsc_prefetch`. Do not error except in these three
    cases in order to preserve backward compatibility.

    I manually ran all of these tests on PVC and confirmed they pass with
    this change and fail without it.

    In some cases, in stateful mode, the underlying intrinsic we call only
    supports 32-bit offsets, so we need to convert.

    ---------

    Signed-off-by: Sarnie, Nick <[email protected]>

commit f34e5458aa63bb2a4362c327859f49474f873b9d
Author: Nick Sarnie <[email protected]>
Date:   Wed Jun 7 20:42:52 2023 -0400

    [SYCL][ESIMD] Use SPIR-V intrinsic to cast image object to int (#9696)

    We currently have a hack that relies on the type the Clang frontend
    generates for images, see
    [here](https://github.com/intel/llvm/blob/12dd0ad040ea61f1201fa9d82efd5079ce7dc6ca/sycl/include/sycl/ext/intel/esimd/detail/memory_intrin.hpp#L1171).

    With opaque pointers, the Clang frontend generates image types as target
    extension types instead of pointers, so the hack fails.

    The cleanest way to fix this would be to do the cast at
    reverse-translation time inside IGC, however the IGC team refused that
    solution.

    Instead, punt the cast to inside the SPIR-V translator when converting
    to SPIR-V, where the type will be a pointer as well.

    The `__spirv_ConvertPtrToU` function will be converted to
    `OpConvertPtrToU` inside the SPIR-V translator.

    This is definitely still a hack, but I don't think it's more hacky than
    before, and I don't know of any other ways to fix this.

    Note this solution works for both typed pointers and opaque pointers,
    and for normal pointer accessors and image accessors.

    Signed-off-by: Sarnie, Nick <[email protected]>

commit 8990c5503d47e397c837d991bf6bc5a0feda9b8a
Author: Igor Gorban <[email protected]>
Date:   Wed Jun 7 22:19:33 2023 +0200

    [SYCL] Fix handling unsupported attributes (#9756)

    llvm::Attribute::ReadNone/ReadOnly/WriteOnly are no longer supports,
    to have posibility to fix them with calls, generated by external library
    (vc-intrinsics) - it is needed to remove them manually

    It is impossible to fix this on vc-intrinsics side, because it works
    not only with latest llvm-version and use this attributes in another
    projects.

    ---------

    Signed-off-by: Vyacheslav N Klochkov <[email protected]>
    Co-authored-by: Vyacheslav N Klochkov <[email protected]>

commit 485221047281e3d47f7376394667b85a63173991
Author: aelovikov-intel <[email protected]>
Date:   Wed Jun 7 08:48:04 2023 -0700

    [CI] Generate test matrix on self-hosted runner (#9773)

    Github's ubuntu-* runners could take multiple hours to allocate in our
    organization. Switch to our self-hosted cuda runner that is sitting idle
    because we perform CUDA testing in AWS.

commit 64bd50820262ded6fbd32d63bd96d5fdbf6861ac
Author: aelovikov-intel <[email protected]>
Date:   Wed Jun 7 07:45:29 2023 -0700

    [SYCL][CI] Fuse two post-commit builds into one (#9695)

commit 074276b9ae760528d97f75d767a1744e6f2a3f2f
Merge: 0ca2be5c82ec d48a5fb2b664
Author: Artur Gainullin <[email protected]>
Date:   Wed Jun 7 07:11:20 2023 -0700

    Merge remote-tracking branch 'origin/sycl' into llvmspirv_pulldown

commit d48a5fb2b6645819a6811a65a402d322c222dc36
Author: fineg74 <[email protected]>
Date:   Wed Jun 7 04:59:29 2023 -0700

    [SYCL][ESIMD] Update the test regression/atomic_update_test.cpp to improve reliability (#9715)

commit eb7e3f032ff98fa98b4927fe9a785e75a5c51240
Author: jinz2014 <[email protected]>
Date:   Wed Jun 7 06:58:20 2023 -0400

    [SYCL] Add unit tests for the HIP plugin (#9391)

    The kernel test (test_kernels.cpp) is incomplete because how to generate
    binary files properly for "piProgramCreateWithBinary" for the HIP
    backend is not clear to me.

    Thank you for reviewing and editing the PR.

    ---------

    Co-authored-by: Jin Z <[email protected]>
    Co-authored-by: Dmitry Vodopyanov <[email protected]>
    Co-authored-by: Jin Z <[email protected]>

commit 3c19581f828c54ff1037a420b4614c01628bcc56
Author: jinge90 <[email protected]>
Date:   Wed Jun 7 16:35:10 2023 +0800

    [SYCL][libdevice] Move fabs, fabs to imf_fp32/64_dl.cpp and add llabs (#9732)

    fabsf, fabs and llabs are required by deep learning frameworks, so we
    move fabsf and fabs to separate file imf_fp32/64_dl.cpp and add llabs to
    imf_fp32_dl.cpp as well.

    Signed-off-by: jinge90 <[email protected]>

commit f96b85d002745aea35114c512aae020a0e5caaca
Author: Chris Perkins <[email protected]>
Date:   Tue Jun 6 14:04:26 2023 -0700

    disable ze_debug tests on Windows for known failures.  (#9764)

    some of the ze_debug=4 memory leak tests are failing on Windows. These
    are not new failures, as the ze_debug=4 memory checker was disabled on
    Windows for a long time. It has recently been re-enabled, and now these
    tests are failing. The shutdown() procedure on Windows is not (yet)
    parallel to Linux, work is ongoing on that front. This PR disables these
    tests until we reach shutdown() parity.

    FWIW, the Windows OS is super aggressive about reclaiming memory, and
    the BKM in complex situations like this is to just let Windows reclaim.

    Signed-off-by: Chris Perkins <[email protected]>

commit 19b6247ed9be9e2baae2e5a0a1ddddf4f412b1e7
Author: aelovikov-intel <[email protected]>
Date:   Tue Jun 6 12:56:11 2023 -0700

    [SYCL][Test E2E] Fix SG sizes detection in lit.cfg.py (#9761)

commit 0ca2be5c82ec6b5be0f5ef6850b3afbfbc99aba3
Author: Churina, Ksenia <[email protected]>
Date:   Tue Jun 6 12:45:28 2023 -0700

    Disable Basic/stream/stream.cpp test for HIP until it is fixed

commit 93a487cc72a7e0c4852a41678d102a08e20192b0
Author: jinz2014 <[email protected]>
Date:   Tue Jun 6 15:42:29 2023 -0400

    [SYCL][HIP] Add the interop-buffer-hip test (#9705)

    Co-authored-by: Jin Z <[email protected]>

commit 4826c07e02c1df6cf4ac4f21b650efec37d583c4
Author: Pablo Reble <[email protected]>
Date:   Tue Jun 6 14:41:05 2023 -0500

    [SYCL] ABI check script improve path concatenation (#9482)

    Patch fixes path concatenation issue. Script fails if the provided path
    has no trailing slash.
    Should work OS independently. Manually tested on Linux.

commit 3350c05baf495da71222590becc6ec7e9dae50f8
Author: Srividya Sundaram <[email protected]>
Date:   Tue Jun 6 12:31:13 2023 -0700

    [SYCL] Add ESIMD test to check kernel arg size (#9076)

commit ca55b912d4f08e04abdb654b9f5ed7f18dd87fd8
Author: fineg74 <[email protected]>
Date:   Tue Jun 6 12:21:26 2023 -0700

    [ESIMD] Make the test regression/bfloat16Constructor.cpp executable on GEN12 (#9748)

commit 4a76d213c24cac4615a8f9e57fa3dc643c931956
Author: Fedor Veselovskiy <[email protected]>
Date:   Tue Jun 6 21:19:41 2023 +0200

    [SYCL][InvokeSimd][E2E] Remove XFAIL status from InvokeSimd named barrier tests (#9741)

commit db6bec7b7e31ac18c92e71776fda833707678515
Author: Fraser Cormack <[email protected]>
Date:   Tue Jun 6 20:18:11 2023 +0100

    [SYCL][Fusion] Add missing header (#9691)

    This was causing build failures with some compilers.

commit c899a93410c23b26a600159762e4dab5f240bc1f
Author: Przemyslaw-Wisniewski-Mobica <[email protected]>
Date:   Tue Jun 6 21:17:01 2023 +0200

    [SYCL] Add sycl/detail/defines_elementary.hpp to bit_cast.hpp to be self contained (#9684)

commit f19cfe6a97699a11c78ae248ffe548a8889992bc
Author: Andrey Alekseenko <[email protected]>
Date:   Tue Jun 6 21:13:13 2023 +0200

    [SYCL][CUDA] Fix info::device::version (#9623)

    Report major.minor instead of major.major

commit f73230d8a8ba75b0b43b27ce09253e1b51e1757f
Author: aelovikov-intel <[email protected]>
Date:   Tue Jun 6 12:01:40 2023 -0700

    [SYCL][ABI-break] Remove getOSModuleHandle usage (#9659)

    test-e2e/SharedLib,SPVDumpUse show that we don't really need it.

commit f44d0133d4b0077298f034697a1f3818ff1d6134
Author: Dirk MG Seynhaeve <[email protected]>
Date:   Tue Jun 6 11:00:34 2023 -0700

    [NFC] Productize clang-offload-extract: clean up code for command line parsing and help (#9594)

    * Impose the mandatory LLVM style for clang-format
    * Remove any code that was trying to enhance the LLVM builtin help
    functionality: the extra code only made for confusing help and error
    messages.
    * Don't provide any required options, but provide reasonable defaults.
    * Clean up the descriptions for the help. Use easier-to-maintain
    heredocs for the multiline descriptions.
    * Use the more trivial `--stem` rather than `--output`. The `--output`
    option is still supported, but labeled deprecated.
    * Enforce double-dash long options.
    * Provide more context in error diagnostics.
    * Streamline the searches and predicates.
    * Modernize LLVM (e.g. remove predicated makeArrayRef).
    * More efficient iterators for the range-based for loops.
    * Extensive comments.

commit 8364176393ad741b5dbf56ae58e2c0da1a908bad
Author: aelovikov-intel <[email protected]>
Date:   Tue Jun 6 10:20:44 2023 -0700

    [SYCL] Add tests for SYCL_DUMP_IMAGES/SYCL_USER_KERNEL_SPV (#9725)

    That required to introduce an extra environment variable control -
    `SYCL_DUMP_IMAGES_PREFIX` to control location of the produced images.

commit 0267c1b237409fb5ffc28c0511a30153c29fe29f
Author: JackAKirk <[email protected]>
Date:   Tue Jun 6 14:51:55 2023 +0100

    [SYCL][CUDA] Enable sycl-ls-gpu-default-any on CUDA (#9372)

    This is a migration of this PR
    https://github.com/intel/llvm-test-suite/pull/1144/commits

    ---------

    Signed-off-by: JackAKirk <[email protected]>

commit d46d3d68700288203a5c709a7469b6883104f335
Author: JackAKirk <[email protected]>
Date:   Tue Jun 6 14:50:30 2023 +0100

    [SYCL][CUDA][DOC] Added Tensor Cores supported param combinations table to joint_matrix extension doc (#9019)

    This PR documents the supported joint_matrix API parameters sets when
    using `ext_oneapi_cuda`, similar to the XMX, AMX tables added here:
    https://github.com/intel/llvm/pull/7964

    This will allow us to point people who would like to use `joint_matrix`
    on a specific architecture to the extension document. E.g.
    https://github.com/intel/llvm/issues/8795

    ---------

    Signed-off-by: JackAKirk <[email protected]>

commit c0ab9f8bf0d5f6722c03cfd0aba7aca0ae9a2e81
Author: Jakub Chlanda <[email protected]>
Date:   Tue Jun 6 15:48:37 2023 +0200

    [SYCL] Add native half type flag for NVPTX >= SM_53 (#8906)

    LLVM will now error out if builtins operating on half types are used
    without explicitly passing `-fnative-half-type` (see:
    https://reviews.llvm.org/D146715). PTX supports half types since
    [SM_53](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html?highlight=half%20precision#half-precision-floating-point-instructions).

commit 1a283acaac3cb944746da21e6337ce6cdaea9711
Author: Christoph Bauinger <[email protected]>
Date:   Tue Jun 6 15:41:27 2023 +0200

    [SYCL] Add proposal for append_and_shift extension (#8902)

    Proposal extends the existing shift_group_left and shift_group_right
    functions to first append and then shift such that all items in a sub
    group can have well-defined values after the shift.

    ---------

    Co-authored-by: Greg Lueck <[email protected]>

commit 10d2f5b613f3f7e383a8140d38cabc35551b68ea
Author: mmoadeli <[email protected]>
Date:   Tue Jun 6 14:40:15 2023 +0100

    Adds explicit conversion of multi_ptr<T> to multi_ptr<const T>. (#9750)

    This ctor has been previously removed, as it had conflict with existing
    ones.
    Not having the ctor produces failures to compile some cts tests. An
    investigation is required.

commit fa501fd21a286c5d6d760249a88c6ffddaffe2e8
Author: Georgi Mirazchiyski <[email protected]>
Date:   Tue Jun 6 12:52:19 2023 +0100

    [SYCL][CUDA] Add fix for local size calculation regression (#9736)

    This PR fixes a performance regression wrt work-group size selection
    when only `sycl::range` is used.
    The regression was reported in issue
    [#5627](https://github.com/intel/llvm/issues/5627).

    We want the work-groups to be uniformly distributed but that could lead
    to non-optmially sized work-groups is the global work size is not an
    even number. Ideally, we want ensure that the work-group size is a power
    of two.

commit 37bb6a2bab16f58d7fe8f7418688d36db9e4422a
Author: Petr Vesely <[email protected]>
Date:   Tue Jun 6 12:16:21 2023 +0100

    [SYCL][PI][UR] Fix pi2ur sampler return info (#9693)

    pi2ur was missing a conversion from UnifiedRuntime sampler info values
    to valid PI sampler Info values. This PR implements a valid conversion
    between these values.

commit 2ab86f1149b7965bf352d2604bf9c95d98c0b350
Author: aelovikov-intel <[email protected]>
Date:   Tue Jun 6 04:15:13 2023 -0700

    [CI] Include check-libdevice to BUILD LIT checks (#9743)

commit 835ced6c88de821f9c3d97138153828845d3e631
Author: aelovikov-intel <[email protected]>
Date:   Mon Jun 5 21:36:53 2023 -0700

    [CI] Align installation steps between Linux/Windows (#9746)

    * Use LLVM_INSTALL_UTILS=ON on Windows
    * Move clang-{format,tidy} installation into its own step
    * Reorder lines to match between Linux/Windows

commit 09f76e8afd2ffcd988cc490aebc775a304cd23a6
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Jun 5 19:35:18 2023 -0700

    Bump requests from 2.28.1 to 2.31.0 in /llvm/utils/git (#9560)

    Bumps [requests](https://github.com/psf/requests) from 2.28.1 to 2.31.0.
    <details>
    <summary>Release notes</summary>
    <p><em>Sourced from <a
    href="https://github.com/psf/requests/releases">requests's
    releases</a>.</em></p>
    <blockquote>
    <h2>v2.31.0</h2>
    <h2>2.31.0 (2023-05-22)</h2>
    <p><strong>Security</strong></p>
    <ul>
    <li>
    <p>Versions of Requests between v2.3.0 and v2.30.0 are vulnerable to
    potential
    forwarding of <code>Proxy-Authorization</code> headers to destination
    servers when
    following HTTPS redirects.</p>
    <p>When proxies are defined with user info (<a
    href="https://user:pass@proxy:8080">https://user:pass@proxy:8080</a>),
    Requests
    will construct a <code>Proxy-Authorization</code> header that is
    attached to the request to
    authenticate with the proxy.</p>
    <p>In cases where Requests receives a redirect response, it previously
    reattached
    the <code>Proxy-Authorization</code> header incorrectly, resulting in
    the value being
    sent through the tunneled connection to the destination server. Users
    who rely on
    defining their proxy credentials in the URL are <em>strongly</em>
    encouraged to upgrade
    to Requests 2.31.0+ to prevent unintentional leakage and rotate their
    proxy
    credentials once the change has been fully deployed.</p>
    <p>Users who do not use a proxy or do not supply their proxy credentials
    through
    the user information portion of their proxy URL are not subject to this
    vulnerability.</p>
    <p>Full details can be read in our <a
    href="https://github.com/psf/requests/security/advisories/GHSA-j8r2-6x86-q33q">Github
    Security Advisory</a>
    and <a
    href="https://nvd.nist.gov/vuln/detail/CVE-2023-32681">CVE-2023-32681</a>.</p>
    </li>
    </ul>
    <h2>v2.30.0</h2>
    <h2>2.30.0 (2023-05-03)</h2>
    <p><strong>Dependencies</strong></p>
    <ul>
    <li>
    <p>⚠️ Added support for urllib3 2.0. ⚠️</p>
    <p>This may contain minor breaking changes so we advise careful testing
    and
    reviewing <a
    href="https://urllib3.readthedocs.io/en/latest/v2-migration-guide.html">https://urllib3.readthedocs.io/en/latest/v2-migration-guide.html</a>
    prior to upgrading.</p>
    <p>Users who wish to stay on urllib3 1.x can pin to
    <code>urllib3&lt;2</code>.</p>
    </li>
    </ul>
    <h2>v2.29.0</h2>
    <h2>2.29.0 (2023-04-26)</h2>
    <p><strong>Improvements</strong></p>
    <ul>
    <li>Requests now defers chunked requests to the urllib3 implementation
    to improve
    standardization. (<a
    href="https://github.com/psf/requests/issues/6226">#6226</a>)</li>
    <li>Requests relaxes header component requirements to support bytes/str
    subclasses. (<a
    href="https://github.com/psf/requests/issues/6356">#6356</a>)</li>
    </ul>
    <!-- raw HTML omitted -->
    </blockquote>
    <p>... (truncated)</p>
    </details>
    <details>
    <summary>Changelog</summary>
    <p><em>Sourced from <a
    href="https://github.com/psf/requests/blob/main/HISTORY.md">requests's
    changelog</a>.</em></p>
    <blockquote>
    <h2>2.31.0 (2023-05-22)</h2>
    <p><strong>Security</strong></p>
    <ul>
    <li>
    <p>Versions of Requests between v2.3.0 and v2.30.0 are vulnerable to
    potential
    forwarding of <code>Proxy-Authorization</code> headers to destination
    servers when
    following HTTPS redirects.</p>
    <p>When proxies are defined with user info (<a
    href="https://user:pass@proxy:8080">https://user:pass@proxy:8080</a>),
    Requests
    will construct a <code>Proxy-Authorization</code> header that is
    attached to the request to
    authenticate with the proxy.</p>
    <p>In cases where Requests receives a redirect response, it previously
    reattached
    the <code>Proxy-Authorization</code> header incorrectly, resulting in
    the value being
    sent through the tunneled connection to the destination server. Users
    who rely on
    defining their proxy credentials in the URL are <em>strongly</em>
    encouraged to upgrade
    to Requests 2.31.0+ to prevent unintentional leakage and rotate their
    proxy
    credentials once the change has been fully deployed.</p>
    <p>Users who do not use a proxy or do not supply their proxy credentials
    through
    the user information portion of their proxy URL are not subject to this
    vulnerability.</p>
    <p>Full details can be read in our <a
    href="https://github.com/psf/requests/security/advisories/GHSA-j8r2-6x86-q33q">Github
    Security Advisory</a>
    and <a
    href="https://nvd.nist.gov/vuln/detail/CVE-2023-32681">CVE-2023-32681</a>.</p>
    </li>
    </ul>
    <h2>2.30.0 (2023-05-03)</h2>
    <p><strong>Dependencies</strong></p>
    <ul>
    <li>
    <p>⚠️ Added support for urllib3 2.0. ⚠️</p>
    <p>This may contain minor breaking changes so we advise careful testing
    and
    reviewing <a
    href="https://urllib3.readthedocs.io/en/latest/v2-migration-guide.html">https://urllib3.readthedocs.io/en/latest/v2-migration-guide.html</a>
    prior to upgrading.</p>
    <p>Users who wish to stay on urllib3 1.x can pin to
    <code>urllib3&lt;2</code>.</p>
    </li>
    </ul>
    <h2>2.29.0 (2023-04-26)</h2>
    <p><strong>Improvements</strong></p>
    <ul>
    <li>Requests now defers chunked requests to the urllib3 implementation
    to improve
    standardization. (<a
    href="https://github.com/psf/requests/issues/6226">#6226</a>)</li>
    <li>Requests relaxes header component requirements to support bytes/str
    subclasses. (<a
    href="https://github.com/psf/requests/issues/6356">#6356</a>)</li>
    </ul>
    <h2>2.28.2 (2023-01-12)</h2>
    <!-- raw HTML omitted -->
    </blockquote>
    <p>... (truncated)</p>
    </details>
    <details>
    <summary>Commits</summary>
    <ul>
    <li><a
    href="https://github.com/psf/requests/commit/147c8511ddbfa5e8f71bbf5c18ede0c4ceb3bba4"><code>147c851</code></a>
    v2.31.0</li>
    <li><a
    href="https://github.com/psf/requests/commit/74ea7cf7a6a27a4eeb2ae24e162bcc942a6706d5"><code>74ea7cf</code></a>
    Merge pull request from GHSA-j8r2-6x86-q33q</li>
    <li><a
    href="https://github.com/psf/requests/commit/302225334678490ec66b3614a9dddb8a02c5f4fe"><code>3022253</code></a>
    test on pypy 3.8 and pypy 3.9 on windows and macos (<a
    href="https://github.com/psf/requests/issues/6424">#6424</a>)</li>
    <li><a
    href="https://github.com/psf/requests/commit/b639e66c816514e40604d46f0088fbceec1a5149"><code>b639e66</code></a>
    test on py3.12 (<a
    href="https://github.com/psf/requests/issues/6448">#6448</a>)</li>
    <li><a
    href="https://github.com/psf/requests/commit/d3d504436ef0c2ac7ec8af13738b04dcc8c694be"><code>d3d5044</code></a>
    Fixed a small typo (<a
    href="https://github.com/psf/requests/issues/6452">#6452</a>)</li>
    <li><a
    href="https://github.com/psf/requests/commit/2ad18e0e10e7d7ecd5384c378f25ec8821a10a29"><code>2ad18e0</code></a>
    v2.30.0</li>
    <li><a
    href="https://github.com/psf/requests/commit/f2629e9e3c7ce3c3c8c025bcd8db551101cbc773"><code>f2629e9</code></a>
    Remove strict parameter (<a
    href="https://github.com/psf/requests/issues/6434">#6434</a>)</li>
    <li><a
    href="https://github.com/psf/requests/commit/87d63de8739263bbe17034fba2285c79780da7e8"><code>87d63de</code></a>
    v2.29.0</li>
    <li><a
    href="https://github.com/psf/requests/commit/51716c4ef390136b0d4b800ec7665dd5503e64fc"><code>51716c4</code></a>
    enable the warnings plugin (<a
    href="https://github.com/psf/requests/issues/6416">#6416</a>)</li>
    <li><a
    href="https://github.com/psf/requests/commit/a7da1ab3498b10ec3a3582244c94b2845f8a8e71"><code>a7da1ab</code></a>
    try on ubuntu 22.04 (<a
    href="https://github.com/psf/requests/issues/6418">#6418</a>)</li>
    <li>Additional commits viewable in <a
    href="https://github.com/psf/requests/compare/v2.28.1...v2.31.0">compare
    view</a></li>
    </ul>
    </details>
    <br />

    [![Dependabot compatibility
    score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=requests&package-manager=pip&previous-version=2.28.1&new-version=2.31.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

    Dependabot will resolve any conflicts with this PR as long as you don't
    alter it yourself. You can also trigger a rebase manually by commenting
    `@dependabot rebase`.

    [//]: # (dependabot-automerge-start)
    [//]: # (dependabot-automerge-end)

    ---

    <details>
    <summary>Dependabot commands and options</summary>
    <br />

    You can trigger Dependabot actions by commenting on this PR:
    - `@dependabot rebase` will rebase this PR
    - `@dependabot recreate` will recreate this PR, overwriting any edits
    that have been made to it
    - `@dependabot merge` will merge this PR after your CI passes on it
    - `@dependabot squash and merge` will squash and merge this PR after
    your CI passes on it
    - `@dependabot cancel merge` will cancel a previously requested merge
    and block automerging
    - `@dependabot reopen` will reopen this PR if it is closed
    - `@dependabot close` will close this PR and stop Dependabot recreating
    it. You can achieve the same result by closing it manually
    - `@dependabot ignore this major version` will close this PR and stop
    Dependabot creating any more for this major version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this minor version` will close this PR and stop
    Dependabot creating any more for this minor version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this dependency` will close this PR and stop
    Dependabot creating any more for this dependency (unless you reopen the
    PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the
    [Security Alerts page](https://github.com/intel/llvm/network/alerts).

    </details>

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 9a82d283ae2bbc092e796571ec9defbc7eb9c4a6
Author: Michael Toguchi <[email protected]>
Date:   Mon Jun 5 18:34:02 2023 -0700

    [Driver][SYCL] Fix optimization option processing for device options (#9703)

    When using -O0, we imply -cl-opt-disable for device. This was
    incorrectly being implied when we were overriding with an optimization
    enabling option (-O0 -O2). Fix the logic.

commit 0ae900a9a6784f45833784b9f4262d622733a789
Author: Artur Gainullin <[email protected]>
Date:   Mon Jun 5 17:16:36 2023 -0700

    [SYCL] Fix kernel-bundle-merge-options-env.cpp test

    Test is supposed to check that options provided through the driver via
    -Xsycl-target-linker and -Xsycl-target-frontend get overriden by options
    provided through env variables SYCL_PROGRAM_COMPILE_OPTIONS and
    SYCL_PROGRAM_LINK_OPTIONS. Test author used dummy options called "-bar"
    and "-bar_compile" to check that they are overriden. But those are
    actually considered not as a dummy option but as a real option "-b"
    which was not the original intent.

    After the commit in llorg which removes "-b" option from the driver:
    commit 89d71c1efa85656b54bcd79b4278bc67690480e1
    Author: Fangrui Song <[email protected]>
    Date:   Fri May 26 15:30:23 2023 -0700

        [Driver] Reject AIX-specific link options on non-AIX targets

    test started to fail. So, replace those options with "-DBAR" and
    "-DBAR_COMPILE" respectively.

commit 1a3e99307e4f754d300a14f4dec8111322644d85
Author: Steffen Larsen <[email protected]>
Date:   Mon Jun 5 16:51:57 2023 +0100

    [SYCL] Add missing SYCL 2020 image is_property_of specializations (#9652)

    This commit adds specializations of is_property_of for
    property::image::use_host_ptr, property::image::use_mutex, and
    property::image::context_bound with unsampled_image and sampled_image.
    Likewise, this commit adds specializations of is_property_of for
    property::no_init with unsampled_image_accessor, sampled_image_accessor,
    host_unsampled_image_accessor and host_sampled_image_accessor.

    Signed-off-by: Larsen, Steffen <[email protected]>

commit c953fed97ef68a15efa82b5e211809be8695a0da
Author: jinge90 <[email protected]>
Date:   Mon Jun 5 22:45:25 2023 +0800

    [SYCL][libdevice] Add libdevice lit test to check 'double' usage for fp32 spirv file on-double spirv file(#9711)

commit ef1f8462cf270a925b34b7c35da7e2f29c654355
Author: Steffen Larsen <[email protected]>
Date:   Mon Jun 5 13:29:14 2023 +0100

    [SYCL][NFC] Remove unused parameter in preScreenAccessor (#9737)

    Addresses post-commit failure after
    https://github.com/intel/llvm/pull/9634

    Signed-off-by: Larsen, Steffen <[email protected]>

commit 0c0809590f378925415e6fd317867b8123aaf0e6
Author: mmoadeli <[email protected]>
Date:   Mon Jun 5 08:35:05 2023 +0100

    [SYCL] Allow accessor constructed with zero-size buffers (#9634)

    * Allow accessor constructed with zero-size buffers. [Clarify behaviour
    for range of zero](https://github.com/KhronosGroup/SYCL-Docs/pull/192)
    * Remove existing error disallowing it.
    * Add test

    ---------

    Co-authored-by: Steffen Larsen <[email protected]>

commit 2648b7c5e1a4af8e0ffded1c431b79813fb24777
Author: Artur Gainullin <[email protected]>
Date:   Fri Jun 2 16:10:24 2023 -0700

    [SYCL] Rename win_proxy_loader to pi_win_proxy_loader (#9724)

    Co-authored-by: Dale <[email protected]>

commit 4c5521c9edae675bff012c367cf53b457068f039
Author: aelovikov-intel <[email protected]>
Date:   Fri Jun 2 14:58:00 2023 -0700

    [CI] Fix pre-commit job dependencies on Windows (#9727)

    Bug-fix for https://github.com/intel/llvm/pull/9709.

commit 35171b3c360092299bd43b3ad10ba885254ed805
Author: Erich Keane <[email protected]>
Date:   Fri Jun 2 14:00:14 2023 -0700

    Finish fixing 2nd SemaSYCl test due to diag change.

    My previous commit for SemaSYCL seemingly missed 1 spot, this
    patch fixes that one too.

commit f874ec8410fd6bf94b996df0e32ca2087addcec5
Author: Erich Keane <[email protected]>
Date:   Fri Jun 2 13:45:22 2023 -0700

    Fix 2 sycl tests: SemaSYCL/loop_fusion.cpp, SemaSYCL/fpga_pipes.cpp

    Two tests failed because the diagnostic message format changed,
    but emission of it was not updated.  This patch corrects that.

commit 12dd0ad040ea61f1201fa9d82efd5079ce7dc6ca
Author: Byoungro So <[email protected]>
Date:   Fri Jun 2 11:39:53 2023 -0700

    [SYCL] Free allocated memory to avoid memory leak (#9722)

    We just need to call free() to avoid memory leak.

    Signed-off-by: Byoungro So <[email protected]>

commit c6500e41fdc02545ae1867e9c3a868734ecc62c2
Author: aelovikov-intel <[email protected]>
Date:   Fri Jun 2 08:03:34 2023 -0700

    Revert "[SYCL][CI] Cancel in-progress pre_commit job when PR is updated (#9706)" (#9721)

    This reverts commit 1db96de9f9b394fbed0b8953849108f255dd31d7.

    CI seems to be stuck after this PR has been merged.

commit 11ac7300305669b6e23bbac03c8c1fe0214cac8e
Author: Justin Cai <[email protected]>
Date:   Fri Jun 2 00:28:50 2023 -0700

    [SYCL] Add support for scalar logical operators with group algorithms (#9298)

commit e33c2f666e3b5fc873c23e08963ca71c5fc39509
Author: tovinkere <[email protected]>
Date:   Thu Jun 1 23:49:09 2023 -0700

    [XPTI] CMakeFiles fix to support independent build of XPTI (#9262)

    There have been requests from tools implementors to be able to
    independently build XPTI proxy
    library and the existing CMakeFiles.txt has issues that prevent this and
    needed to be addressed.

    ---------

    Signed-off-by: Vasanth Tovinkere <[email protected]>

commit e45834c363d0c26d9c461455ea9654fb1ff947eb
Author: rdeodhar <[email protected]>
Date:   Thu Jun 1 23:47:14 2023 -0700

    [SYCL] [L0] Test adjustment for Windows (#9658)

    Explicitly enable a default context so that all queues use that context
    and immediate command list recycling happens as expected,

commit 260182a1ad758994a652b4241bbe22f6f13cc003
Author: Jaime Arteaga <[email protected]>
Date:   Thu Jun 1 20:36:03 2023 -0700

    [SYCL][UR][L0] Clean up events on queue wait (#9643)

    After the last command in an in-order queue has completed, clean up the
    rest of the events so they are available for later reuse.

    Signed-off-by: Jaime Arteaga <[email protected]>

commit a4283b33744d095743015f44a90afd003c2564ae
Author: aelovikov-intel <[email protected]>
Date:   Thu Jun 1 20:32:20 2023 -0700

    [SYCL][CI][WIN] Skip some checks depending on what files have changed (#9709)

    Follow-up for #9589 implementing the same as it on Windows.

commit 1db96de9f9b394fbed0b8953849108f255dd31d7
Author: aelovikov-intel <[email protected]>
Date:   Thu Jun 1 20:30:57 2023 -0700

    [SYCL][CI] Cancel in-progress pre_commit job when PR is updated (#9706)

commit 66b2e89172001c8e9bc60f402b811e7b41e43e0a
Author: aelovikov-intel <[email protected]>
Date:   Thu Jun 1 20:20:52 2023 -0700

    [SYCL][CI] Improve compression performance (#9675)

    This was originally implemented in
    https://github.com/intel/llvm/pull/5678.

    Start with Linux only for now. Benchmarking several compression
    utilities for time/size:

    |         | Pack time | Upload time | Size   |
    | ------- | --------- | ----------- | ------ |
    | xz      | 5m 20s    | 1m 30s      | 350 MB |
    | lz4     | 3s        | 3m 10s      | 660 MB |
    | zstd -9 | 25s       | 2m 4s       | 467 MB |

    The difference in size between xz/lz4 would result in 1m30s -> 3m
    increase in artifacts upload time so the pack time gain would be
    partially offset by that.

    I don't see a way to get data about unpack from the CI, but locally on a
    different machine (and likely with a different build) I had this:

    |      | Pack time | Unpack time |
    | ---- | --------- | ----------- |
    | xz   | 28m 30s   | 1m 13s      |
    | lz4  | 11s       | 6s          |
    | zstd | 1m 22s    | 8s          |

    Based on the data above we're switching to use `zstd -9` as our
    compression algorithm.

commit 856ad1d77927ddef77a2a8e6ec5ed43eeb4b75eb
Author: fineg74 <[email protected]>
Date:   Thu Jun 1 15:46:40 2023 -0700

    [ESIMD][E2E] Temporarily disable -ffast-math option for 7 LIT tests (#9660)

    This PR is a work around for tests failing when compiled with icpx and
    succeeding when compiled with clang++. The root cause of that behavior
    is fast-math option that is enabled by default when using icpx and
    disabled by default when using clang++. As a work around the affected
    tests will be compiled with no-fast-math option.

commit 43d20039920ee187b379781188148fd4cccf6786
Author: Nick Sarnie <[email protected]>
Date:   Thu Jun 1 18:20:09 2023 -0400

    [SYCL][ESIMD][E2E] Fix ext_math_ieee_sqrt_div on emulator (#9680)

    Similar to the other ext_math tests, this needs -fno-fast-math as well.

    Signed-off-by: Sarnie, Nick <[email protected]>

commit 27755824d050679127580ea7a7baf28cea38d91b
Author: Nick Sarnie <[email protected]>
Date:   Thu Jun 1 17:45:43 2023 -0400

    [SYCL][ESIMD] Fix gather/scatter with accessors when passing scalar (#9674)

    This regressed in
    https://github.com/intel/llvm/commit/d04ebb03c1c891077974622c99027a72bad34b71
    when we added a template arg. Since we have a template arg, we won't
    also call the constructor.

    Signed-off-by: Sarnie, Nick <[email protected]>

commit ac1c91e533ebffd8f0629c9c072ea91a807fcf0d
Author: Kseniya Tikhomirova <[email protected]>
Date:   Thu Jun 1 21:44:31 2023 +0200

    [SYCL] Fix post commit fail related to std::unique_lock CTAD in unit tests (#9698)

    Signed-off-by: Tikhomirova, Kseniya <[email protected]>

commit 57187f6f14c1a9e9ed669bcfb2432f4ebfc90dbb
Author: aelovikov-intel <[email protected]>
Date:   Thu Jun 1 09:36:36 2023 -0700

    [SYCL][CI] Add zstd to our build image (#9681)

    I will remove unneeded package (lz4 or/and zstd) once we settle which
    one is the best for our use.

commit 01d7fc097ec6b5e380db1a07b6caee475e1c695f
Author: Maksim Sabianin <[email protected]>
Date:   Thu Jun 1 18:26:15 2023 +0200

    [SYCL] Remove reduntant sycldevice support (#9653)

commit 712138f6d84f45c14b7a6fb4dd1432a8b3aa1949
Author: aelovikov-intel <[email protected]>
Date:   Thu Jun 1 07:57:34 2023 -0700

    [SYCL][CI] Create nightly container based on the "build" image (#9685)

    I plan to use it in post commit to merge two builds

    [linux_default](https://github.com/intel/llvm/blob/ac8408c4761180835fb23ccd5183efd5c5c37d95/.github/workflows/sycl_post_commit.yml#L26-L38)
    and

    [self_build](https://github.com/intel/llvm/blob/ac8408c4761180835fb23ccd5183efd5c5c37d95/.github/workflows/sycl_post_commit.yml#L39-L51)
    into one.

    I can also imagine how we can use that in place of [HIP/CUDA image for
    E2E
    tests](https://github.com/intel/llvm/blob/24955697d9f08c0bc7e1f2b80182c7d967f53b70/.github/workflows/sycl_gen_test_matrix.yml#L10-L17)
    for PRs that only update E2E tests.

commit cebe7da1e21072d158c089e258a28ffe7e951a7a
Author: jinz2014 <[email protected]>
Date:   Thu Jun 1 10:43:06 2023 -0400

    [SYCL][HIP] Display the backend name in intel-ext-device.cpp (#9688)

commit 54fcf80f2351a75281f627f9d80b9a86e686c6fc
Author: Kseniya Tikhomirova <[email protected]>
Date:   Thu Jun 1 16:38:40 2023 +0200

    [SYCL] Fix and reenable unit test for xpti_trace (#9587)

    Signed-off-by: Tikhomirova, Kseniya <[email protected]>

commit 1dce70f413e686bc6fe3af30f99f478c954ee35f
Author: Justin Cai <[email protected]>
Date:   Thu Jun 1 06:15:30 2023 -0700

    [SYCL] Enable proper behavior of optional kernel features with SYCL_EXTERNAL (#9611)

    Currently, the code generated from a translation unit with a declaration
    of a `SYCL_EXTERNAL` function with a `[[sycl::device_has(...)]]`
    attribute, but with no definition of that function, is a LLVM module
    with a declaration of the function but with no `sycl_declared_aspects`
    metadata. Because of this, `SYCLPropagateAspectsPass` does not propagate
    any used aspect information to functions that (transitively) call a
    `SYCL_EXTERNAL` function. This causes `sycl-post-link` to fail to split
    kernels that call `SYCL_EXTERNAL` functions with different required
    aspects.

    With this PR, the `sycl_declared_aspects` metadata is now attached to a
    `SYCL_EXTERNAL` function even if there is no definition (in the same
    translation unit). Additionally, `SYCLPropagateAspectsPass` now collects
    aspects information for function declarations.

commit 1bae4b76f88bdee7c37d6f11b75cefe6f1a494eb
Author: Sven van Haastregt <[email protected]>
Date:   Wed May 31 12:47:07 2023 +0100

    Use clang to generate compile_commands (#2031)

    Ensure the code-formatting job uses clang to generate
    compile_commands.json, to avoid passing GCC-specific flags to
    clang-format or clang-tidy.

    Original commit:
    https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/c2ff406

commit 353f349fa7f689963f4cc59faa710c290522650e
Author: Nick Sarnie <[email protected]>
Date:   Tue May 30 06:42:59 2023 -0400

    Skip spirv decoration metadata with --spirv-preserve-auxdata (#2013)

    It's already explicitly handled for forward and reverse translation,
    and it's a bit complicated to handle MDNode metadata. Just skip it so we don't assert.

    If I see this come up in more cases I will add support for MDNode metadata.

    Signed-off-by: Sarnie, Nick <[email protected]>

    Original commit:
    https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/89d658c

commit 23a3ea0775149b04daba041de55c150785d2f101
Author: Dmitry Sidorov <[email protected]>
Date:   Sun May 28 18:41:04 2023 +0200

    Relax consumer checks for checksum info (#2011)

    It's a follow up for
    https://github.com/KhronosGroup/SPIRV-LLVM-Translator/pull/1996
    since I couldn't update the PR

    Signed-off-by: Sidorov, Dmitry <[email protected]>

    Original commit:
    https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/8cbf726

commit e4ad410f1eeb38659f959bca24d74547e8871274
Merge: fdd609a5c724 d9a9f60248dc
Author: sys_ce_bb <[email protected]>
Date:   Thu Jun 1 06:04:54 2023 -0700

    Merge remote-tracking branch 'origin/sycl-web' into llvmspirv_pulldown

commit fdd609a5c724a69e24ac1a80fdea6b34714660c0
Author: Kseniya Tikhomirova <[email protected]>
Date:   Thu Jun 1 12:05:43 2023 +0200

    [SYCL][ABI-break] Add code_location parameter to the rest of sycl::queue methods (#9603)

    code_location helps to improve error reporting and allow to detect exact
    code lines for failed command submission.

    ---------

    Signed-off-by: Tikhomirova, Kseniya <[email protected]>

commit 7618dffd78ae8456df9885c35d200604748233ec
Author: mmoadeli <[email protected]>
Date:   Thu Jun 1 09:03:48 2023 +0100

    [SYCL] Lost data during implicit conversion in local and host accessors. (#9669)

    * Fix local_accessor and host_accessor lost data during implicit
    conversion.
    * Add relevant test.

commit 4eaaaa963ca2f58358ea0897d30374cf9928b80b
Author: Kseniya Tikhomirova <[email protected]>
Date:   Thu Jun 1 10:03:33 2023 +0200

    [SYCL] Enable xpti::node_create signal emit for parallel_for that bypasses graph (#9565)

    xpti::node_create signal is emitted when we create new node in graph.
    Code related to it is present in Command::emitInstrumentationData and
    Command successors. Although we have a path when no memory dependencies
    is tracked for kernel (e.g. queue::parallel_for) and to speed up kernel
    enqueue and eliminate extra overhead - node is not added to graph (and
    related Command is not created too). This commit adds this node_create
    signal to be emitted in this case.

    ---------

    Signed-off-by: Tikhomirova, Kseniya <[email protected]>

commit 9e5889918277e921ef8c4724fe22ab6d638fdfb4
Author: Vyacheslav Klochkov <[email protected]>
Date:   Wed May 31 22:41:15 2023 -0500

    [ESIMD][DOC] Update description of accessor-based memory APIs (#9582)

    ESIMD has got support of `local accessor`, methods `get_pointer()` and
    `operator[]` of accessor class, new `slm_allocator` class to reserve
    extra SLM for local needs.

    Also, this patch described some existing restrictions for `slm_init`
    function

    ---------

    Signed-off-by: Vyacheslav N Klochkov <[email protected]>

commit d3aaccc7561b3664fb2a039f6a32629c65fc9d05
Author: aelovikov-intel <[email protected]>
Date:   Wed May 31 16:05:55 2023 -0700

    [SYCL][CI] Skip some checks depending on what files have changed (#9589)

    I'm using https://github.com/dorny/paths-filter to implement it.

    I decided to call it from `sycl_precommit.yml` so that we can
    potentially re-use its results between Linux/Windows tasks but that
    might have its own drawbacks. I don't see a possibility to just pass the
    result of the job between workflows (`sycl_precommit` ->
    `sycl_linux_build_and_test`) which means that for every value I have to
    thread it carefully via latter's inputs. That might complicate things in
    future if we'd want to run just the modified end-to-end tests instead of
    all of them.

    Another approach would be to run the job inside
    `sycl_linux_build_and_test` so that I'd have immediate access to its
    output from anywhere in the workflow.

commit f110fd73f8e7e51d3b0eb0595162f129ea74cb21
Author: Byoungro So <[email protected]>
Date:   Wed May 31 15:46:46 2023 -0700

    [SYCL] Avoid unnecessary kernel retain (#9557)

    We should retain the kernel only for OpenCL backend.

    Signed-off-by: Byoungro So <[email protected]>

commit ac8408c4761180835fb23ccd5183efd5c5c37d95
Author: Joshua Cranmer <[email protected]>
Date:   Wed May 31 17:47:32 2023 -0400

    [SYCL][OpaquePtrs] Convert some sycl tests to opaque pointers. (#9536)

    This does not fix all of the lit tests that fail with opaque pointers
    enabled, but it does fix those where the test is looking for IR whose
    form has changed with opaque pointers enabled.

commit 24955697d9f08c0bc7e1f2b80182c7d967f53b70
Author: Dmitry Vodopyanov <[email protected]>
Date:   Wed May 31 21:03:56 2023 +0200

    [SYCL] Revert regression for atomic64 after #9561 (#9625)

    Fixes regression introduced in https://github.com/intel/llvm/pull/9561
    by reverting the affected code

commit d9a9f60248dc73b975e19c634cf6790db0473bf0
Merge: 182ec5bb2718 a88f496f8f3b
Author: Gainullin, Artur <[email protected]>
Date:   Wed May 31 14:30:11 2023 -0400

    Merge from 'main' to 'sycl-web' (54 commits)

      CONFLICT (content): Merge conflict in clang/lib/Sema/Sema.cpp

commit 182ec5bb2718e2676a616fc5a0ceaf2a339b50ff
Merge: 6532d2ee8b34 f9b489c7a88b
Author: iclsrc <[email protected]>
Date:   Wed May 31 10:53:04 2023 -0700

    Merge from 'sycl' to 'sycl-web' (6 commits)

commit f9b489c7a88b3b130f22678de79d5cf4f00d6b2c
Author: aelovikov-intel <[email protected]>
Date:   Wed May 31 10:10:06 2023 -0700

    [SYCL][CI] Add lz4 to our build image (#9677)

commit 6532d2ee8b347a4f1e3c4db29229822e2f2865be
Merge: 916980317aa1 33ee5c466346
Author: Gainullin, Artur <[email protected]>
Date:   Wed May 31 12:57:09 2023 -0400

    Merge from 'main' to 'sycl-web' (82 commits)

      CONFLICT (content): Merge conflict in clang/lib/Sema/SemaDeclAttr.cpp
      CONFLICT (content): Merge conflict in clang/lib/Sema/SemaType.cpp

commit b793a58559a21d89b2c6ef9a3ad2953597be3e17
Author: Jaime Arteaga <[email protected]>
Date:   Wed May 31 09:31:06 2023 -0700

    [SYCL][UR][L0] Fix unused parameter (#9670)

    Signed-off-by: Jaime Arteaga <[email protected]>

commit 06ed924eb112a001c7397c5fcee0b8a8f4ed08dd
Author: JackAKirk <[email protected]>
Date:   Wed May 31 17:05:29 2023 +0100

    [SYCL][CUDA] Check make_device doesn't create duplicate sycl::device (#9373)

    Check make_device doesn't create duplicate sycl::device.

    Migration of https://github.com/intel/llvm-test-suite/pull/1419

    Tests https://github.com/intel/llvm/pull/7550. Checks that make_device
    doesn't return a duplicate sycl::device if one already exists.

    Signed-off-by: JackAKirk <[email protected]>

commit a88f496f8f3baa6c3b15532e37e3bdbb1c4ea0d0
Author: Kazu Hirata <[email protected]>
Date:   Wed May 31 08:59:35 2023 -0700

    [Sema] Remove unused function getFloat128Identifier

    The last use was removed by:

      commit bb1ea2d6139a72340b426e114510c46d938645a6
      Author: Nemanja Ivanovic <[email protected]>
      Date:   Mon May 9 08:52:33 2016 +0000

    Differential Revision: https://reviews.llvm.org/D151608

commit 8e728adcfedd97fbc3759b5533d0cbada6b68aa6
Author: Marco Elver <[email protected]>
Date:   Wed May 31 17:57:07 2023 +0200

    Revert "[compiler-rt] Avoid memintrinsic calls inserted by the compiler"

    This reverts commit 4369de7af46605522bf7dbe3bc31d00b0eb4bee6.

    Fails on Mac OS with "sanitizer_libc.cpp:109:5: error: aliases are not
    supported on darwin".

commit fc8acb563ae019735e646f9964b254cab1efd529
Author: Caroline Concatto <[email protected]>
Date:   Wed May 31 14:12:08 2023 +0000

    [Clang][SVE2.1] Add clang support for builtins  using svcount_t

    In this patch it is used for the prototype:
      * svptrue_c8 (and _c16/_c32/_c64)

     As described in: https://github.com/ARM-software/acle/pull/257

    Patch by: Sander de Smalen <[email protected]>

    Reviewed By: sdesmalen, david-arm

    Differential Revision: https://reviews.llvm.org/D150953

commit 71d5a94985c9569467c1ef8a62b8b326ee2036a6
Author: Peter Klausler <[email protected]>
Date:   Thu May 25 16:01:52 2023 -0700

    [flang] Don't fold SIZE()/SHAPE() into expression referencing optional dummy arguments

    When computing the shape of an expression at compilation time as part of
    folding an intrinsic function like SIZE(), don't create an expression that
    increases a dependence on the presence of an optional dummy argument.

    Differential Revision: https://reviews.llvm.org/D151737

commit 660e4530124356442ff63d61b1f6dcb9c1def7e6
Author: Nikita Popov <[email protected]>
Date:   Wed May 31 10:10:47 2023 +0200

    [KnownBits] Also test 1-bit values in exhaustive tests (NFC)

    Similar to what we do with ConstantRanges, also test 1-bit values
    in exhaustive tests, as these often expose special conditions.
    This would have exposed the assertion failure fixed in D151788
    earlier.

commit 6eef8d9b2bbfdb3920b6eeafc939a2d62ad5295b
Author: Kazu Hirata <[email protected]>
Date:   Wed May 31 08:45:29 2023 -0700

    [RISCV] Fix an unused variable warning

     llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:3793:7:
     error: unused variable 'XLenVT' [-Werror,-Wunused-variable]

commit d6a36619cec44d02a2a3526eceb2ac128d90e030
Author: Simon Pilgrim <[email protected]>
Date:   Wed May 31 15:33:44 2023 +0100

    [X86] X86FixupVectorConstantsPass - use VBROADCASTSS/VBROADCASTSD for integer vector loads on AVX1-only targets

    Matches behaviour in lowerBuildVectorAsBroadcast

commit f29f1c7e23d555c95a199f8e77fefe87e91664cf
Author: Mark de Wever <[email protected]>
Date:   Sun May 28 14:23:12 2023 +0200

    [libc++]{CI] Bumps clang-tidy version used.

    The CI can no longer run with clang-tidy 16 increment it to version 17.
    Whether permanently moving to the latest development version is being
    discussed on Discourse.

    Depends on D149455

    Reviewed By: #libc, ldionne

    Differential Revision: https://reviews.llvm.org/D151628

commit cf64668b8c414c60aec12cdd7374ea053fc99411
Author: Mark de Wever <[email protected]>
Date:   Fri Apr 28 17:38:47 2023 +0200

    [libc++][test] Prefers the newer clang-tidy version.

    Module require Clang 17, since Clang 16 requires the magic # __FILE__
    line. Therefore, if available, use clang-tidy 17 too. This change should
    be reverted after LLVM 17 is released.

    Reviewed By: #libc, ldionne

    Differential Revision: https://reviews.llvm.org/D149455

commit 5d4281d5493c7a2fc09d9ac9fc5b374676a4d8af
Author: Mark de Wever <[email protected]>
Date:   Thu May 25 21:59:25 2023 +0200

    [libc++] Gives ignore external linkage.

    A slightly different fix is in D144994.

    Reviewed By: #libc, ldionne

    Differential Revision: https://reviews.llvm.org/D151490

commit ac7d60f73a4a369fb4dcce734d54cb38fde80981
Author: Mark de Wever <[email protected]>
Date:   Tue May 23 17:14:20 2023 +0200

    [libc++] Fixes use-after move diagnostic.

    The diagnostic is issued by clang-tidy 17.

    This just suppressed the diagnostic. The move operations are non-standard extensions and the class itself is deprecated.

    Reviewed By: #libc, ldionne

    Differential Revision: https://reviews.llvm.org/D151223

commit 7578672c96e18feb5982192e595459b2a65867cf
Author: Dave Lee <[email protected]>
Date:   Sat May 20 10:05:44 2023 -0700

    [lldb] Override GetVariable in ValueObjectSynthetic (NFC)

    Make `GetVariable` a passthrough function the the underlying value object in `ValueObjectSynthetic`.

    Differential Revision: https://reviews.llvm.org/D151384

commit 42e98c6ae875e952ee852f78234c0f8ed311472b
Author: Nikita Popov <[email protected]>
Date:   Wed May 31 10:16:16 2023 +0200

    [APInt] Support zero-width extract in extractBitsAsZExtValue()

    D111241 added support for extractBits() with zero width. Extend this
    to extractBitsAsZExtValue() as well for consistency (in which case
    it will always return zero).

    Differential Revision: https://reviews.llvm.org/D151788

commit 3825910c7316cf62549bd31c503c48e7526adcc2
Author: Nico Weber <[email protected]>
Date:   Wed May 31 11:12:32 2023 -0400

    [gn] port 4369de7af466

commit cb463c34dd4c3ad2ac6c13f98edcf684a3fcbe38
Author: Dave Lee <[email protected]>
Date:   Fri May 26 21:19:10 2023 -0700

    [lldb] Take StringRef name in GetChildMemberWithName (NFC)

    `GetChildMemberWithName` does not need a `ConstString`. This change makes the function
    take a `StringRef` instead, which alleviates the need for callers to construct a
    `ConstString`. I don't expect this change to improve performance, only ergonomics.

    This is in support of Alex's effort to replace `ConstString` where appropriate.

    There are related `ValueObject` functions that can also be changed, if this is accepted.

    Differential Revision: https://reviews.llvm.org/D151615

commit e0df106818ccb90dc46c5296ed5ef2eda75564ff
Author: Paul Scoropan <[email protected]>
Date:   Tue May 30 15:07:44 2023 +0000

    [Flang] Move several definitions to IntrinsicCall header for code cleanliness and reusability

    In the future we intend to add support for many PowerPC-specific intrinsics that ideally will exist in a separate new PPCIntrinsicCall file. But first we need to move definitions to the IntrinsicCall header file to increase code cleanliness and readability and to make code reusable for when we add PPCIntrinsicCall.

    Reviewed By: vzakhari

    Differential Revision: https://reviews.llvm.org/D151715

commit 572cfa3fde5433c889b339e9cfa6dfaa23e5f2ee
Author: Florian Hahn <[email protected]>
Date:   Wed May 31 16:00:57 2023 +0100

    [LV] Use SCEV for uniformity analysis across VF

    This patch uses SCEV to check if a value is uniform across a given VF.

    The basic idea is to construct SCEVs where the AddRecs of the loop are
    adjusted to reflect the version in the vectorized loop (Step multiplied
    by VF). We construct a SCEV for the value of the vector lane 0
    (offset 0) compare it to the expressions for lanes 1 to the last vector
    lane (VF - 1). If they are equal, consider the expression uniform.

    While re-writing expressions, we also need to catch expressions we
    cannot determine uniformity (e.g. SCEVUnknown).

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D148841

commit 4369de7af46605522bf7dbe3bc31d00b0eb4bee6
Author: Marco Elver <[email protected]>
Date:   Tue May 30 11:59:22 2023 +0200

    [compiler-rt] Avoid memintrinsic calls inserted by the compiler

    D135716 introduced -ftrivial-auto-var-init=pattern where supported.
    Unfortunately this introduces unwanted memset() for large stack arrays,
    as shown by the new tests added for asan and msan (tsan already had this
    test).

    In general, the problem of compiler-inserted memintrinsic calls
    (memset/memcpy/memmove) is not new to compiler-rt, and has been a
    problem before.

    To avoid introducing unwanted memintrinsic calls, we redefine
    memintrinsics as __sanitizer_internal_mem* at the assembly level for
    most source files automatically (where sanitizer_common_internal_defs.h
    is included).

    In few cases, redefining a symbol in this way causes issues for
    interceptors, namely the memintrinsic interceptor themselves. For such
    source files we have to selectively disable the redefinition.

    Other alternatives have been considered, but simply do not work well in
    the context of compiler-rt:

    	1. Linker --wrap:  this does not work because --wrap only
    	   applies to the final link, and would not apply when building
    	   sanitizer static libraries.

    	2. Changing references to memset() via objcopy:  this may work,
    	   but due to the complexities of the build system, introducing
    	   such a post-processing step for the right object files (in
    	   particular object files defining memset cannot be touched)
    	   seems infeasible.

    The chosen solution works well (as shown by the tests). Other libraries
    have chosen the same solution where nothing else works (see e.g. glibc's
    "symbol-hacks.h").

    v2:
    - Fix ubsan_minimal build where compiler decides to insert
      memset/memcpy: ubsan_minimal has work without RTSanitizerCommonLibc,
      therefore do not redefine the builtins.
    - Fix definition of internal_mem* functions with compilers that want the
      aliased function to already be defined before.
    - Fix definition of __sanitizer_internal_mem* functions with compilers
      more pedantic about attribute placement around extern "C".

    Reviewed By: vitalybuka, dvyukov

    Differential Revision: https://reviews.llvm.org/D151152

commit 26d7b7bb8ff982b6cdcd9bf7538405356135b724
Author: Michael Liao <[email protected]>
Date:   Fri May 26 12:58:12 2023 -0400

    [TableGen] Add !getdagarg and !getdagname

    - This patch proposes to add `!getdagarg` and `!getdagname` bang
      operators as the inverse operation of `!dag`. They allow us to examine
      arguments of a given dag.

    Reviewed By: simon_tatham

    Differential Revision: https://reviews.llvm.org/D151602

commit e69318138e6cc88becbb8d095b1d2dcf76ac45e1
Author: Philip Reames <[email protected]>
Date:   Wed May 31 07:48:17 2023 -0700

    [RISCV] Use v(f)slide1down for shuffle+insert idiom

    This is a follow up to D151468 which added the vslide1down case as a sub-case of vslide1down matching. This generalizes that code into generic mask matching - specifically to point out the sub-vector insert restriction in the original patch. Since the matching logic is basically the same, go ahead and support vslide1up at the same time.

    Differential Revision: https://reviews.llvm.org/D151742

commit 5442264744f4e6f925bcb06ae60687ec3c2e9d7f
Author: Nikita Popov <[email protected]>
Date:   Wed May 31 16:39:41 2023 +0200

    [InstCombine] Name instructions in test (NFC)

commit 66b9e114326462eb4a7b67dccf36cca875b8791b
Author: myl <[email protected]>
Date:   Wed May 31 22:33:07 2023 +0800

    Temporarily add explicit '-O2' for Basic/image/image_read*.cpp to avoid GPU hang issue with O0 optimization. (#9664)

commit 6ef3efc9c46591e94165533f461ac5a17adc527d
Author: aelovikov-intel <[email protected]>
Date:   Wed May 31 07:32:48 2023 -0700

    [SYCL][CI] Fuse self-build and no-asserts build (#9655)

    Co-authored-by: Alexey Bader <[email protected]>

commit f9b523ebc367f1535bf61797383471e567b24b75
Author: Kazu Hirata <[email protected]>
Date:   Wed May 31 07:30:14 2023 -0700

    [Analysis] Remove unused class LegacyAARGetter

    The last use was removed by:

      commit fa6ea7a419f37befbed04368bcb8af4c718facbb
      Author: Arthur Eubanks <[email protected]>
      Date:   Mon Mar 20 11:18:35 2023 -0700

    Once we remove it, createLegacyPMAAResults and createLegacyPMAAResults
    become unused, so this patch removes them as well.

    Differential Revision: https://reviews.llvm.org/D151787

commit 8634b43a03945971c2939833ac686728bee5a760
Author: Fangrui Song <[email protected]>
Date:   Wed May 31 07:19:44 2023 -0700

    [ELF][RISCV] --wrap=foo: Correctly update st_value(foo)

    With --wrap=foo, we may have `d->file != file` for a defined symbol `foo`.
    For the object file defining `foo`, its symbol table may not contain
    `foo` after `redirectSymbols` changed the `foo` entry to `__wrap_foo` (see D50569).

    Therefore, skipping `foo` with the condition `if (!d || d->file != file)` may
    cause `__wrap_foo` not to be updated. See `ab.o w.o --wrap=foo` in the new test
    (originally reported by D150220).

    We could adjust the condition to `if (!d)`, but that would leave many `anchors`
    entries if a symbol is referenced by many files. Switch to iterating over
    `symtab` instead.

    Note: D149735 (actually not NFC) allowed duplicate `anchors` entries and fixed
    `a.o bw.o --wrap=foo`.

    Reviewed By: jobnoorman

    Differential Revision: https://reviews.llvm.org/D151768

commit e9c9d54cf5959fa020cf76e47ced4575793f6d60
Author: Vyacheslav Klochkov <[email protected]>
Date:   Wed May 31 09:16:30 2023 -0500

    [ESIMD][LIT] Fix usage of -fno-fast-math and -fno-slp-vectorize with cl (#9661)

    clang-cl driver does not understand -fno-fast-math and
    -fno-slp-vectorize. Usage of those options requires adding "/clang:"
    before the option.

    Signed-off-by: Vyacheslav N Klochkov <[email protected]>

commit 408f4196ba4ac66328ebfcf41cb372572257c4f6
Author: Tom Eccles <[email protected]>
Date:   Wed May 17 16:07:41 2023 +0000

    [flang] use greedy mlir driver for stack arrays pass

    In upstream mlir, the dialect conversion infrastructure is used for
    lowering from one dialect to another: the passes are of the form
    XToYPass. Whereas, transformations within the same dialect tend to use
    applyPatternsAndFoldGreedily.

    In this case, the full complexity of applyPatternsAndFoldGreedily isn't
    needed so we can get away with the simpler applyOpPatternsAndFold.

    This change was suggested by @jeanPerier

    The old differential revision for this patch was
    https://reviews.llvm.org/D150853

    Re-applying here fixing the issue which led to the patch being reverted. The
    issue was from erasing uses of the allocation operation while still iterating
    over those uses (leading to a use-after-free). I have added a regression
    test which catches this bug for -fsanitize=address builds, but it is
    hard to reliably cause a crash from the use-after-free in normal builds.

    Differential Revision: https://reviews.llvm.org/D151728

commit 543705641adb1d3533be141947264ca1b7b04479
Author: Paul Robinson <[email protected]>
Date:   Wed May 31 06:43:27 2023 -0700

    [Headers][doc] Fix typo in avx2intrin.h doc

commit f6a631d4060c5b539fd51b7221205ee05ec50ee8
Author: Jan Sjodin <[email protected]>
Date:   Tue May 30 14:28:12 2023 -0500

    [MLIR] Remove dependency on omp dialect in LLVM dialect.

    This fixes a buildbot failure where the dependency on the omp dialect
    in the LLVM dialect caused error. Instead of accessing the interface
    defined in the omp dialect we directly access the attributes
    instead. To make this work the IsDeviceAttr is removed and replaced
    with a BoolAttr instead.

    Reviewed By: kiranchandramohan

    Differential Revision: https://reviews.llvm.org/D151745

commit e5399f1d7cabfca90030ca03f52818e892aa389f
Author: Paul Robinson <[email protected]>
Date:   Tue May 30 13:30:12 2023 -0700

    [Headers][doc] Add shuffle-like intrinsic descriptions to avx2intrin.h

    Differential Revision: https://reviews.llvm.org/D151749

commit 0a3dc73e700b4a37bc435bf7c02213161b27f54a
Author: Dmitry Makogon <[email protected]>
Date:   Wed May 31 20:23:19 2023 +0700

    [Test] Move LoopStrengthReduce/pr62563.ll to X86 specific test folder (NFC)

    The test case is X86 specific. Should unblock buildbots after 253e3e2.

commit 6bcbb3af059b05056c7343cafd99004d4cd4cd35
Author: Florian Hahn <[email protected]>
Date:   Wed May 31 14:22:44 2023 +0100

    [ConstraintElim] Move logic to remove stack entry to helper (NFC).

    Preparation for follow-up patch that uses the logic in a separate place.

commit 97f0e7b06e6b76fd85fb81b8c12eba2255ff1742
Author: Nikita Popov <[email protected]>
Date:   Wed May 31 14:53:44 2023 +0200

    [AA] Fix comparison of AliasResults (PR63019)

    Comparison between two AliasResults implicitly decayed to comparison
    of AliasResult::Kind. As a result, MergeAliasResults() ended up
    considering two PartialAlias results with different offsets as
    equivalent.

    Fix this by adding an operator== implementation. To stay
    compatible with extensive use of comparisons between AliasResult
    and AliasResult::Kind, add an overload for that as well, which
    will ignore the offset. In the future, it would probably be a
    good idea to remove these implicit decays to AliasResult::Kind
    and add dedicated methods to check for specific AliasResult kinds.

    Fixes https://github.com/llvm/llvm-project/issues/63019.

commit 4d64ffa94170eadd79954e2a5f13d1f1d16e9e2c
Author: Nikita Popov <[email protected]>
Date:   Wed May 31 14:55:11 2023 +0200

    [GVN] Add test for PR63019 (NFC)

commit ce97312d109b21acb97d3ea243e214f20bd87cfc
Author: Arnaud Bienner <[email protected]>
Date:   Wed May 31 10:54:27 2023 +0200

    Implement BufferOverlap check for sprint/snprintf

    Differential Revision: https://reviews.llvm.org/D150430

commit 916980317aa18cd55727feae689026d4bd5a23e2
Merge: 606c74d747f2 0000fa6a925e
Author: iclsrc <[email protected]>
Date:   Wed May 31 05:37:05 2023 -0700

    Merge from 'sycl' to 'sycl-web'

commit 0b42ee46b06fb9fb396eca8b335166d8e92b70cd
Author: LLVM GN Syncbot <[email protected]>
Date:   Wed May 31 12:30:10 2023 +0000

    [gn build] Port 26bda9e95a9d

commit dd2fea9c23e6dabd83d3f4ee7d000ceb16cace55
Author: Thorsten Schütt <[email protected]>
Date:   Thu May 25 17:47:00 2023 +0200

    [GlobalIsel][X86] Legalize G_CTLZ and G_CTPOP for 32-bit

    Note that 32-bit support is very limited

    Reviewed By: RKSimon

    Differential Revision: https://reviews.llvm.org/D151459

commit 344e91a6f00840e67fc03bcfeca6c34fa6d34b17
Author: Nico Weber <[email protected]>
Date:   Wed May 31 08:17:44 2023 -0400

    [gn] port 301eb6b68f3 (AttrTokenKinds.inc)

commit 64bd5bbb9bbb72de5f59755c74dae4b4881d93d5
Author: rikhuijzer <[email protected]>
Date:   Wed May 31 14:13:08 2023 +0200

    [mlir] Avoid tensor canonicalizer crash on negative dimensions

    Fixes #59703.

    Reviewed By: ftynse

    Differential Revision: https://reviews.llvm.org/D151611

commit c76a3e795ef6bd5262b5860ebcc902fab3fab607
Author: Guillaume Chatelet <[email protected]>
Date:   Wed May 31 12:06:45 2023 +0000

    [libc][NFC] Fixing various typos

commit 0000fa6a925ef8d0fcd97c1765a7f24b85110610
Author: JackAKirk <[email protected]>
Date:   Wed May 31 13:02:04 2023 +0100

    [SYCL][CUDA] opportunistic_group, fixed_size_group, and ballot_group impls. (#9280)

    This basic cuda support does not include any algorithm support.
    Algorithm support will follow in a later PR.
    Since all intel ba…
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
Copy link
Contributor

@rsandifo-arm rsandifo-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments below, but LGTM otherwise. Once the SME2 stuff is in, I think we should consolidate the intrinsics that are common between SME2 and SVE2p1, rather than duplicating them. I agree the current form makes sense until then though.

main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated
// _u64base_u8, _u64base_u16, _u64base_s16, _u64base_u32, _u64base_s32,
// _u64base _u64, _u64base_s64
// _u64base_bf16, _u64base_f16, _u64base_f32, _u64base_f64
svint8_t svld1q_gather[_u64base_s8](svbool_t pg, svint64_t zn, const void *rm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should provide the same addressing modes as for LDNT1 gather:

  • svld1q_gather[_u64base]_xx(svbool_t pg, svuint64_t zn) (note svuint64_t rather than svint64_t)
  • svld1q_gather[_u64base]_offset_xx(svbool_t pg, svuint64_t zn, int64_t offset)
  • svld1q_gather[_u64base]_index_xx(svbool_t pg, svuint64_t zn, int64_t index)
  • svld1q_gather_[u64]offset[_xx](svbool_t pg, const xx_t *base, svuint64_t offset)
  • svld1q_gather_[u64]index[_xx](svbool_t pg, const xx_t *base, svuint64_t index) for 16-bit, 32-bit and 64-bit xx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine we should do the same for the ST1Q scatter quadrword, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, same thing there.

main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated
// Variants are also available for:
// _s8 _u16, _s16, _u32, _s32, _u64, _s64
// _bf16, _f16, _f32, _f64
void svst2q[_u8](svbool_t pg, uint8_t *rn, svuint8x2_t zt);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolineConcatto Is there a reason why the pointers for the structured quad-word stores use uint8_t *, instead of the int8_t * for the svld2q, etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is meant to vary with the suffix, so it's uint8_t * for the [_u8] function shown, and would be int8_t * for the [_s8] version.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Of course, silly me. :)

main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated

#### LD1Q

Gather Load Quadword.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only an unscaled variant of this instruction, so maybe don't have both offset and index?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other SVE load and store intrinsics, we tried to provide a consistent interface and set of addressing modes. So the deciding factor wasn't so much whether the call mapped to a single instruction, but whether the underlying instruction could easily emulate the mode. “Single instruction” is a bit of nebulous concept anyway for loads and stores, since a single C address expression might need several operations to compute.

Since scaling is just a shift left, I think it's worth providing both index and offset variants.

main/acle.md Outdated

#### ST1Q

Scatter store quadwords.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only an unscaled version of this instruction? So maybe don't have both index and offset?

main/acle.md Outdated

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
svuint8_t svpmov_lane_u8_z(svbool_t pn);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ svuint8_t svpmov_lane_u8_z(svbool_t pn);/ svuint8_t svpmov_u8_z(svbool_t pn);/

Copy link

@ThomasBamelis ThomasBamelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the increased use of x4 vectors in 2.1, would it be the right time to introduce svreinterpret variants for x4 types as well?
With data rearranging, load/storing and element wise bit manipulation changing element size can come in quite handy.

CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 17, 2023
As described in: ARM-software/acle#257

Patch by : David Sherwood <[email protected]>

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D150961
CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 17, 2023
As described in: ARM-software/acle#257

Reviewed By: hassnaa-arm

Differential Revision: https://reviews.llvm.org/D151081
CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 18, 2023
 As described in: ARM-software/acle#257

Patch by : Sander de Smalen<[email protected]>

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D151197
CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 19, 2023
As described in: ARM-software/acle#257

Patch by : Sander de Smalen<[email protected]>

Reviewed By: dtemirbulatov

Differential Revision: https://reviews.llvm.org/D151199
CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 19, 2023
As described in: ARM-software/acle#257

Patch by : David Sherwood <[email protected]>

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D151307
CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 19, 2023
 Patch by : David Sherwood <[email protected]>

As described in: ARM-software/acle#257

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D151433
CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 19, 2023
As described in: ARM-software/acle#257

Patch by: David Sherwood <[email protected]>

Reviewed By: dtemirbulatov

Differential Revision: https://reviews.llvm.org/D151439
CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 19, 2023
As described in: ARM-software/acle#257

Patch by: Kerry McLaughlin <[email protected]>

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D151461
CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Oct 20, 2023
As described in: ARM-software/acle#257

Patch by: Rosie Sumpter <[email protected]>

Reviewed By: dtemirbulatov

Differential Revision: https://reviews.llvm.org/D151709
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Oct 23, 2023
This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn);
uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn);
uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn);
uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn);
float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn);

According to the PR#257[1]

The reduction instruction uses scalable vectors as input and fixed vectors
as output, therefore we changed SVEEmitter to emit fixed vector types in case
the neon header(arm_neon.h) is not present.

[1]ARM-software/acle#257

Co-author: Dinar Temirbulatov <[email protected]>
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jan 3, 2024
This patch changes the following intrinsic

 ```svst1uwq[_{d}]  replaced by svst1wq[_{d}]
 svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}]
 svst1udq[_{d}]  replaced by svst1dq[_{d}]
 svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}]
```
Drops 'u' from the quadword stores because it is simply truncating the
quadwords to 32 bits

```
 svextq_lane[_{d}] replaced by  svextq[_{d}]
```
EXTQ follows the previous defined EXT intrinsics

```
 svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}]
```
Introduced with the latest SME2 ACLE change

[1]ARM-software/acle#257
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jan 3, 2024
This patch changes the following intrinsic

 ```svst1uwq[_{d}]  replaced by svst1wq[_{d}]
 svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}]
 svst1udq[_{d}]  replaced by svst1dq[_{d}]
 svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}]
```
Drops 'u' from the quadword stores because it is simply truncating the
quadwords to 32 bits

```
 svextq_lane[_{d}] replaced by  svextq[_{d}]
```
EXTQ follows the previous defined EXT intrinsics

```
 svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}]
```
Introduced with the latest SME2 ACLE change

[1]ARM-software/acle#257
main/acle.md Outdated
@@ -11829,7 +11829,7 @@ Extract vector segment from each pair of quadword segments.
// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64
// _bf16, _f16, _f32, _f64
svuint8_t svextq_lane[_u8](svuint8_t zdn, svuint8_t zm, uint64_t imm);
svuint8_t svextq[_u8](svuint8_t zdn, svuint8_t zm, uint64_t imm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we dropping the _lane part here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Richard pointed out that the other ext do not have lane in it.

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64
// _bf16, _f16, _f32, _f64
svuint8_t svextq_lane[_u8](svuint8_t zdn, svuint8_t zm, uint64_t imm);
Member
@rsandifo-arm rsandifo-arm 3 weeks ago
I'm not sure these should be lane intrinsics. The instructions are really a form of permutation. (FWIW, the corresponding non-Q intrinsics don't have the _lane suffix.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

CarolineConcatto added a commit to llvm/llvm-project that referenced this pull request Jan 10, 2024
…1] (#76844)

This patch changes the following intrinsic

 ```svst1uwq[_{d}]  replaced by svst1wq[_{d}]
 svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}]
 svst1udq[_{d}]  replaced by svst1dq[_{d}]
 svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}]
```
Drops 'u' from the quadword stores because it is simply truncating the
quadwords to 32 bits

```
 svextq_lane[_{d}] replaced by  svextq[_{d}]
```
EXTQ follows the previous defined EXT intrinsics

```
 svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}]
```
Introduced with the latest SME2 ACLE change

[1]ARM-software/acle#257
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
…1] (llvm#76844)

This patch changes the following intrinsic

 ```svst1uwq[_{d}]  replaced by svst1wq[_{d}]
 svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}]
 svst1udq[_{d}]  replaced by svst1dq[_{d}]
 svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}]
```
Drops 'u' from the quadword stores because it is simply truncating the
quadwords to 32 bits

```
 svextq_lane[_{d}] replaced by  svextq[_{d}]
```
EXTQ follows the previous defined EXT intrinsics

```
 svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}]
```
Introduced with the latest SME2 ACLE change

[1]ARM-software/acle#257
@Lukacma
Copy link
Contributor

Lukacma commented Feb 19, 2024

Hello @CarolineConcatto,

You have forgotten DUPQ instruction for sve2p1 . Prototype will look like this :

   // Variants are also available for:
   // _s8, _u16, _s16, _u32, _s32, _u64, _s64
   // _bf16, _f16, _f32, _f64
   svuint8_t svdup_laneq[_u8](svuint8_t zn, uint64_t imm_idx);

This is different to svdupq_lane intrinsic and they have different behaviour

@CarolineConcatto
Copy link
Contributor Author

I merged SVE2.1 and SME2 intrinsics to 1 section. But I am not sure that is the best.

main/acle.md Outdated Show resolved Hide resolved
main/acle.md Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
This patch adds new intrinsics and types for supporting SVE2.1.
Copy link
Contributor

@rsandifo-arm rsandifo-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version seems to add the shared SVE2.1/SME intrinsics back into the SME section (with __arm_streaming attributes). Is that deliberate?

I think we should only document each intrinsic once, as in the previous version. It's just that the relationship between streaming/non-streaming/streaming-compatible and SME/SME2/SVE2/SVE2.1 can't be expressed directly using attributes (and so needs to be specified in words instead).

main/acle.md Outdated Show resolved Hide resolved
main/acle.md Show resolved Hide resolved
@CarolineConcatto
Copy link
Contributor Author

_This version seems to add the shared SVE2.1/SME intrinsics back into the SME section (with _arm_streaming attributes). Is that deliberate?

No, they should not be in the SME section with streaming attribute.
I had to split the patch in two. All the tests in GitHub were failing.
So I created a patch to add all the SVE2.1 intrinsics an another to remove the ones that already exist in SME2.

  • ab72e2b (HEAD -> sve2.1) Address review comments
  • e9e3450 (origin/sve2.1) Remove from SME2 intriniscs that are common with SVE2.1
  • 16f9477 Add alpha support for SVE2.1

Copy link
Contributor

@rsandifo-arm rsandifo-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the typo below.

main/acle.md Outdated
@@ -8776,8 +8776,8 @@ The functions in this section are defined by the header file
Some instructions overlap with the SME and SME2 architecture extensions and
are additionally available in Streaming SVE mode when __ARM_FEATURE_SME is
non-zero or __ARM_FEATURE_SME2 are non-zero.
For convenience, these the intrinsics for these instructions are listed in
the following section.
For convenience, the intrinsics fo these instructions are listed in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For convenience, the intrinsics fo these instructions are listed in the
For convenience, the intrinsics for these instructions are listed in the

@rsandifo-arm rsandifo-arm merged commit f947de6 into ARM-software:main Apr 12, 2024
4 checks passed
qihangkong pushed a commit to rvgpu/llvm that referenced this pull request Apr 18, 2024
… scatter stores

This patch adds the quadword gather load intrinsics of the form

  (1) sv<type>_t svld1q_gather_u64index_<typ>(svbool_t, const <type>_t *, svuint64_t);
  (2) sv<type>_t svld1q_gather_u64base_index_<typ>(svbool_t, svuint64_t, int64_t);

and the quadword scatter store intrinsics of the form

  (3) void svst1q_scatter_u64index_<typ>(svbool_t, <type>_t *, svuint64_t, sv<type>_t);
  (4) void svst1q_scatter_u64base_index_<typ>(svbool, svuint64_t, int64_t, sv<type>_t);

(intrinsics (1) and (3) are currently missing the variants for non 64-bit sized
base types, e.g. `int8_t` or `bfloat16_t`, etc).

ACLE spec: ARM-software/acle#257
qihangkong pushed a commit to rvgpu/rvgpu-llvm that referenced this pull request Apr 23, 2024
``` c
   // All the intrinsics below are [SVE2.1 or SME2]
   // Variants are also available for _u16[_s32]_x2 and _u16[_u32]_x2
   svint16_t svqcvtn_s16[_s32_x2](svint32x2_t zn);
   ```

According to PR#257[1]

[1]ARM-software/acle#257
qihangkong pushed a commit to rvgpu/rvgpu-llvm that referenced this pull request Apr 23, 2024
…air (#75107)

Add intrinsics of the form:

    svboolx2_t svwhile<cond>_b{8,16,32,64}_[{s,u}64]_x2([u]int64_t, [u]int64_t);

and their overloaded variants as specified in
ARM-software/acle#257
Copy link
Contributor

@sallyarmneale sallyarmneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One very minor comment.

veselypeta pushed a commit to veselypeta/cherillvm that referenced this pull request Aug 27, 2024
In this patch it is used for the prototype:
  * svptrue_c8 (and _c16/_c32/_c64)

 As described in: ARM-software/acle#257

Patch by: Sander de Smalen <[email protected]>

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D150953
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.