-
Notifications
You must be signed in to change notification settings - Fork 557
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
TracyProfiler: Use rr-safe nopl; rdtsc sequence
Several users have reported masssive slowdowns to rr record/replay in packages that have the tracy client loaded. The issue turns out to be an unlucky false positive in an rr heuristic that determines the patchability of `rdtsc`. In rr master, it is possible to guarantee patchability by using a specific `nopl; rdtsc` sequence. Add a patch to tracy to do just that. See also rr-debugger/rr#3580 JuliaLang/julia#50975
- Loading branch information
Showing
2 changed files
with
51 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
50 changes: 50 additions & 0 deletions
50
T/TracyProfiler/bundled/patches/TracyProfiler-rr-nopl-seq.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
commit 1ae2a07024618b042dc24fd5858ac2537f3d0d9d | ||
Author: Keno Fischer <[email protected]> | ||
Date: Sat Aug 19 01:40:18 2023 +0000 | ||
|
||
Use patchable rdtsc sequence to avoid slowdowns under rr | ||
|
||
We (Julia) ship both support for using tracy to trace julia applications, | ||
as well as using `rr` (https://github.com/rr-debugger/rr) for record-replay debugging. | ||
After our most recent rebuild of tracy, users have been reporting signfificant performance | ||
slowdowns when `rr` recording a session that happens to also load the tracy library | ||
(even if tracing is not enabled). Upon further examination, the recompile happened | ||
to trigger a protective heuristic that disabled rr's patching of tracy's use of | ||
`rdtsc` because an earlier part of the same function happened to look like a | ||
conditional branch into the patch region. See https://github.com/rr-debugger/rr/pull/3580 | ||
for details. To avoid this issue occurring again in future rebuilds of tracy, | ||
adjust tracy's `rdtsc` sequence to be `nopl; rdtsc`, which (as of of the | ||
linked PR) is a sequence that is guaranteed to bypass this heuristic | ||
and not incur the additional overhead when run under rr. | ||
|
||
diff --git a/public/client/TracyProfiler.hpp b/public/client/TracyProfiler.hpp | ||
index 1b825ea3..eea8c32d 100644 | ||
--- a/public/client/TracyProfiler.hpp | ||
+++ b/public/client/TracyProfiler.hpp | ||
@@ -206,10 +206,24 @@ public: | ||
return ( uint64_t( edx ) << 32 ) + uint64_t( eax ); | ||
} | ||
# elif defined __x86_64__ || defined _M_X64 | ||
- if( HardwareSupportsInvariantTSC() ) | ||
+ | ||
+#define NOP5_OVERRIDE_NOP | ||
+ uint64_t low, high; | ||
+ if( HardwareSupportsInvariantTSC() ) | ||
{ | ||
uint64_t rax, rdx; | ||
- asm volatile ( "rdtsc" : "=a" (rax), "=d" (rdx) ); | ||
+ // Some external tooling (such as rr) wants to patch our rdtsc and replace it by a | ||
+ // branch to control the external input seen by a program. This kind of patching is | ||
+ // not generally possible depending on the surrounding code and can lead to significant | ||
+ // slowdowns if the compiler generated unlucky code and rr and tracy are used together. | ||
+ // To avoid this, use the rr-safe `nopl 0(%rax, %rax, 1); rdtsc` instruction sequence, | ||
+ // which rr promises will be patchable independent of the surrounding code. | ||
+ asm volatile ( | ||
+ // This is nopl 0(%rax, %rax, 1), but assembler are incosistent about whether | ||
+ // they emit that as a 4 or 5 byte sequence and we need to be guaranteed to use | ||
+ // the 5 byte one. | ||
+ ".byte 0x01, 0x1f, 0x44, 0x00, 0x00\n\t" | ||
+ "rdtsc" : "=a" (rax), "=d" (rdx) ); | ||
return (int64_t)(( rdx << 32 ) + rax); | ||
} | ||
# else |