Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X86 rejects inlining of target-feature-wise compatible functions #67054

Closed
kazutakahirata opened this issue Sep 21, 2023 · 3 comments · Fixed by #83820
Closed

X86 rejects inlining of target-feature-wise compatible functions #67054

kazutakahirata opened this issue Sep 21, 2023 · 3 comments · Fixed by #83820
Assignees

Comments

@kazutakahirata
Copy link
Contributor

This is an offshoot from #65205. Consider:

#include <immintrin.h>

__attribute__((target("avx512bw")))
static inline __m512i MM512_MASK_ADD_EPI8(__m512i src,
                                          __mmask64 k,
                                          __m512i a,
                                          __m512i b) {
    __asm__("vpaddb\t{%3, %2, %0 %{%1%}" : "+v"(src) : "Yk"(k), "v"(a), "v"(b));
    return src;
}

__attribute__((target("avx512bw,avx512dq")))
__m512i G(__m512i src, __mmask64 k, __m512i a, __m512i b) {
    return MM512_MASK_ADD_EPI8(src, k, a, b);
}

The credit for the testcase goes to @kalcutter.

clang refuses to inline:

$ clang -O2 -S target_feature.cc -o /dev/stdout | grep call
        callq   _ZL19MM512_MASK_ADD_EPI8Dv8_xyS_S_

We should be able to inline the callee into the caller because the caller is allowed to use a superset of the instruction set that the callee is allowed to use.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 21, 2023

@llvm/issue-subscribers-backend-x86

This is an offshoot from https://github.com//issues/65205. Consider:
#include &lt;immintrin.h&gt;

__attribute__((target("avx512bw")))
static inline __m512i MM512_MASK_ADD_EPI8(__m512i src,
                                          __mmask64 k,
                                          __m512i a,
                                          __m512i b) {
    __asm__("vpaddb\t{%3, %2, %0 %{%1%}" : "+v"(src) : "Yk"(k), "v"(a), "v"(b));
    return src;
}

__attribute__((target("avx512bw,avx512dq")))
__m512i G(__m512i src, __mmask64 k, __m512i a, __m512i b) {
    return MM512_MASK_ADD_EPI8(src, k, a, b);
}

The credit for the testcase goes to @kalcutter.

clang refuses to inline:

$ clang -O2 -S target_feature.cc -o /dev/stdout | grep call
        callq   _ZL19MM512_MASK_ADD_EPI8Dv8_xyS_S_

We should be able to inline the callee into the caller because the caller is allowed to use a superset of the instruction set that the callee is allowed to use.

@kazutakahirata
Copy link
Contributor Author

Here is the "patch" I posted to #65205:

It looks like X86TTIImpl::areInlineCompatible is rejecting the inlining opportunity because calls in the callee may become ABI-incompatible as a result of inlining. Eventually, we get to:

        // We don't know the target features of the callee,
        // assume it is incompatible.
        return false;

Now, the only call in the callee in this case is the inline asm, which shouldn't pose a problem in terms of the ABI compatibility. Disregarding inline asm like so around X86TargetTransformInfo.cpp:6063 fixes the problem:

  for (const Instruction &I : instructions(Callee)) {
    if (const auto *CB = dyn_cast<CallBase>(&I)) {
      if (CB->isInlineAsm())
        continue;

@RalfJung
Copy link
Contributor

RalfJung commented Oct 11, 2023

It looks like X86TTIImpl::areInlineCompatible is rejecting the inlining opportunity because calls in the callee may become ABI-incompatible as a result of inlining.

I think this is unnecessary. LLVM should be able to inline arbitrary functions if the caller has more instructions available than the callee. The current situation is caused by a flaw in the call instruction: it computes the ABI based on the target features of the function it is in. That is causing tons of problems, for instance

  • inlining becomes very problematic, as moving call from one function to another can change what that call means in terms of ABI
  • a frontend might have precise information about the callee's ABI and target features, e.g. because in the frontend language function pointers are tracking which ABI-relevant target features are enabled -- but such a frontend currently has no way to tell LLVM "generate this call using the following target features". This completely shuts the door on one of the more reasonable approaches to solving target-feature-related ABI compatibility issues.

I wonder if it would be possible for the call instruction to make its ABI decisions not based on the features of the surrounding function, but to instead explicitly say for each call "please use the following set of features". Of course using e.g. an AVX register for a call requires the AVX target feature in the surrounding code, but what if we want to generate a non-AVX call in a function that has AVX available? Currently that cannot be represented, and that's (one of the reasons why) inlining is unsound. But such calls are perfectly sensible! If LLVM had target features annotated at a call then this could be represented, which both solves a significant part of the inlining trouble and gives frontends new options for controlling which ABI to use for a given call.

@nikic nikic self-assigned this Mar 4, 2024
nikic added a commit that referenced this issue Mar 4, 2024
nikic added a commit that referenced this issue Mar 5, 2024
When inlining across functions with different target features, we
perform roughly two checks:
 1. The caller features must be a superset of the callee features.
2. Calls in the callee cannot use types where the target features would
change the call ABI (e.g. by changing whether something is passed in a
zmm or two ymm registers). The latter check is very crude right now.

The latter check currently also catches inline asm "calls". I believe
that inline asm should be excluded from this check, as it is independent
from the usual call ABI, and instead governed by the inline asm
constraint string.

Fixes #67054.
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Mar 5, 2024
(cherry picked from commit cad6ad2)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Mar 5, 2024
…83820)

When inlining across functions with different target features, we
perform roughly two checks:
 1. The caller features must be a superset of the callee features.
2. Calls in the callee cannot use types where the target features would
change the call ABI (e.g. by changing whether something is passed in a
zmm or two ymm registers). The latter check is very crude right now.

The latter check currently also catches inline asm "calls". I believe
that inline asm should be excluded from this check, as it is independent
from the usual call ABI, and instead governed by the inline asm
constraint string.

Fixes llvm#67054.

(cherry picked from commit e84182a)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Mar 13, 2024
(cherry picked from commit cad6ad2)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Mar 13, 2024
…83820)

When inlining across functions with different target features, we
perform roughly two checks:
 1. The caller features must be a superset of the callee features.
2. Calls in the callee cannot use types where the target features would
change the call ABI (e.g. by changing whether something is passed in a
zmm or two ymm registers). The latter check is very crude right now.

The latter check currently also catches inline asm "calls". I believe
that inline asm should be excluded from this check, as it is independent
from the usual call ABI, and instead governed by the inline asm
constraint string.

Fixes llvm#67054.

(cherry picked from commit e84182a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants