Skip to content

Use _tpause instead of __builtin_ia32_tpause#27607

Merged
tianleiwu merged 1 commit intomicrosoft:mainfrom
mocknen:mocknen/replace-builtin-tpause
Mar 12, 2026
Merged

Use _tpause instead of __builtin_ia32_tpause#27607
tianleiwu merged 1 commit intomicrosoft:mainfrom
mocknen:mocknen/replace-builtin-tpause

Conversation

@mocknen
Copy link
Copy Markdown
Contributor

@mocknen mocknen commented Mar 10, 2026

Description

Use _tpause function defined in waitpkgintrin.h instead of calling the compiler built-in function (__builtin_ia32_tpause) directly.

Motivation and Context

The _tpause is independent of the compiler, whereas its implementation via the built-in function __builtin_ia32_tpause varies by compiler. Therefore, it is advisable not to use it directly. For example, GCC and LLVM have different arguments, leading to portability issues.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves portability of the x86_64 spin-wait implementation by using the standardized _tpause intrinsic (from waitpkgintrin.h via existing headers) instead of directly invoking the compiler-specific built-in __builtin_ia32_tpause, which differs across compilers.

Changes:

  • Switch Linux tpause usage from __builtin_ia32_tpause to _tpause.
  • Consolidate the Windows/Linux tpause path under a single preprocessor branch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@tianleiwu
Copy link
Copy Markdown
Contributor

Previously, I investigated an issue for clang build. Here is some information I got:
https://gemini.google.com/share/c1025d3447ee
The recommended change might need change in both cmake configuration and use #if defined(__WAITPKG__).

@mocknen mocknen force-pushed the mocknen/replace-builtin-tpause branch from ed7fa32 to fe170b5 Compare March 11, 2026 03:36
@mocknen
Copy link
Copy Markdown
Contributor Author

mocknen commented Mar 11, 2026

@tianleiwu

Thanks for the advice!

I have changed the section to use __WAITPKG__ instead, and confirmed that it can be compiled both in GCC and LLVM.

However, I'm not sure if the defined(_WIN32) part is correct, but I've left it as is. A quick search suggests that MSVC doesn't support _tpause, so we'd need to use inline assembly. Unfortunately, I don't have a Windows environment available right now to verify this. If we need to change that, perhaps we should discuss it in a separate issue or pull request?

Additional info

generated code from GCC v14.2.0
bbf7d6:   74 28                   je     bbf800 <_ZN11onnxruntime11concurrency9SpinPauseEv+0x40>
bbf7d8:   0f 31                   rdtsc
bbf7da:   31 c9                   xor    ecx,ecx
bbf7dc:   48 c1 e2 20             shl    rdx,0x20
bbf7e0:   48 09 d0                or     rax,rdx
bbf7e3:   48 05 e8 03 00 00       add    rax,0x3e8
bbf7e9:   48 89 c2                mov    rdx,rax
bbf7ec:   48 c1 ea 20             shr    rdx,0x20
bbf7f0:   66 0f ae f1             tpause ecx
generated code from LLVM v21.1.8
1070643:   75 1e                   jne    1070663 <_ZN11onnxruntime11concurrency9SpinPauseEv+0x33>
1070645:   0f 31                   rdtsc
1070647:   48 c1 e2 20             shl    rdx,0x20
107064b:   48 09 d0                or     rax,rdx
107064e:   48 05 e8 03 00 00       add    rax,0x3e8
1070654:   48 89 c2                mov    rdx,rax
1070657:   48 c1 ea 20             shr    rdx,0x20
107065b:   31 c9                   xor    ecx,ecx
107065d:   66 0f ae f1             tpause ecx

@mocknen
Copy link
Copy Markdown
Contributor Author

mocknen commented Mar 11, 2026

Regarding CMake C flags, since -march=native automatically enables -mwaitpkg as well, I don't think any special changes are necessary.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@mocknen mocknen force-pushed the mocknen/replace-builtin-tpause branch from fe170b5 to 7cabbc7 Compare March 11, 2026 06:02
@mocknen mocknen force-pushed the mocknen/replace-builtin-tpause branch from 7cabbc7 to 2d87f10 Compare March 12, 2026 02:18
@mocknen
Copy link
Copy Markdown
Contributor Author

mocknen commented Mar 12, 2026

Rebased to include #27618.

Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tianleiwu tianleiwu enabled auto-merge (squash) March 12, 2026 03:45
@tianleiwu
Copy link
Copy Markdown
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline,
Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@tianleiwu tianleiwu merged commit 60ce0e6 into microsoft:main Mar 12, 2026
100 of 103 checks passed
tianleiwu pushed a commit that referenced this pull request Mar 16, 2026
### Description

Use `_tpause` function defined in `waitpkgintrin.h` instead of calling
the compiler built-in function (`__builtin_ia32_tpause`) directly.

### Motivation and Context

The [`_tpause`][intel-intrinsics-guide] is independent of the compiler,
whereas its implementation via the built-in function
`__builtin_ia32_tpause` varies by compiler. Therefore, it is advisable
not to use it directly. For example, [GCC][waitpkgintrin-gcc] and
[LLVM][waitpkgintrin-llvm] have different arguments, leading to
portability issues.

[intel-intrinsics-guide]:
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=tpause&techs=Other&ig_expand=6888
[waitpkgintrin-gcc]:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/i386/waitpkgintrin.h;h=42c6b0cd02866eccdfe3308f4792f17fe8c6ae38;hb=HEAD#l51
[waitpkgintrin-llvm]:
https://github.com/llvm/llvm-project/blob/a682073ae7a49de4b95498ba01b9ea32e6b5f607/clang/lib/Headers/waitpkgintrin.h#L33-L38
tianleiwu added a commit that referenced this pull request Mar 16, 2026
This cherry-picks the following commits for the release:

| Commit ID | PR Number | Commit Title |
|-----------|-----------|-------------|
| eb23be8 | #27354 | Update python_requires |
| d626b56 | #27479 | [QNN EP] Enable offline x64 compilation with
memhandle IO type |
| 60ce0e6 | #27607 | Use `_tpause` instead of `__builtin_ia32_tpause`
|
| 69feb84 | #27591 | Add PCI bus fallback for Linux GPU device
discovery in containerized environments |
| de92668 | #27650 | Revert "[QNN EP] Fix error messages being logged
as VERBOSE instead o… |
| 0f66526 | #27644 | [Plugin EP] Check for nullptr before
dereferencing |
| 929f73e | #27666 | Plugin EP: Fix bug that incorrectly assigned
duplicate MetDef IDs to fused nodes in different GraphViews |

---------

Co-authored-by: XXXXRT666 <157766680+XXXXRT666@users.noreply.github.com>
Co-authored-by: derdeljan-msft <derdeljan@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Shogo Yamazaki <f9ifphmiz7i8akhowc8l5t1x9qp0lfu4@mocknen.net>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: baijumeswani <12852605+baijumeswani@users.noreply.github.com>
Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants