python: get the Thread State from a Thread-local#1109
Conversation
This change originates from python/cpython#103323. Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
| // extractTLSOffsetFromCodeAMD64 extracts the TLS offset by analyzing x86_64 assembly code. | ||
| // It looks for MOV instructions with FS segment prefix (e.g., MOV rax, FS:[offset]). | ||
| func extractTLSOffsetFromCodeAMD64(code []byte, baseAddr uint64) (int64, error) { |
There was a problem hiding this comment.
This function looks like a partial copy of
I think it would make sense to add the common portion of finding FS:xxx and resolving the xxx to asm/amd as a helper function. The other function also supports RIP relative stuff, so using that as reference would likely be better.
There was a problem hiding this comment.
I have added asm/amd/ExtractFSOffsetFromCode() with d08582d.
While I knew about the duplication with golabels, I wasn't sure how my approach will be received and the difference to resolving the memory reference (Pattern 3). Let me know if this deduplication works for you.
| // extractTLSOffsetFromCodeARM64 extracts the TLS offset by analyzing ARM64 assembly code. | ||
| // It looks for the pattern: MRS Xn, TPIDR_EL0 followed by ADD Xn, Xn, #offset or LDR [Xn, #offset]. | ||
| func extractTLSOffsetFromCodeARM64(code []byte, baseAddr uint64, visited map[uint64]bool, depth int, ef *pfelf.File) (int64, error) { |
There was a problem hiding this comment.
Immediately looking, this probably is generic code and could also live in asm/arm as helper?
There was a problem hiding this comment.
Also if we stick with a recursive function instead of iteration, we can hide it as either an inner function or a separate recursive function that's called by the wrapper (which shouldn't take a map argument). Otherwise, it's sort of ugly to leak internal implementation details (allocate and pass a map to carry state across recursive calls) to the caller.
There was a problem hiding this comment.
Moved and refactored the code with 72de6e3.
Hope this works for you.
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
fabled
left a comment
There was a problem hiding this comment.
looks pretty good to me. some minor clean up comments added
Co-authored-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
#1109 introduced a new asm/arm package. Move functionality of armhelpers into this new package for consistency. Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
|
I just saw this on x86_64: It seems related to this PR. |
PR open-telemetry#1109 already handles Python 3.13+ TLS access via staticTLSOffset, making these checks unnecessary. The eBPF code checks tls_offset directly and uses it for Python 3.13+ without requiring TSDInfo.
When LibcInfo is collected from multiple DSOs (e.g., libc.so and ld-linux.so), UpdateLibcInfo may be called multiple times. For Python versions that don't have a static TLS offset (< 3.13 or when extraction fails), we need TSDInfo to access thread state. Wait until TSDInfo is available before inserting proc data, and prevent duplicate inserts. This is a simplified version that checks staticTLSOffset directly rather than version numbers, since PR open-telemetry#1109 already extracts the TLS offset for Python 3.13+ when available.
When LibcInfo is collected from multiple DSOs (e.g., libc.so and ld-linux.so), UpdateLibcInfo may be called multiple times. For Python versions that don't have a static TLS offset (< 3.13 or when extraction fails), we need TSDInfo to access thread state. Wait until TSDInfo is available before inserting proc data, and prevent duplicate inserts. This is a simplified version that checks staticTLSOffset directly rather than version numbers, since PR open-telemetry#1109 already extracts the TLS offset for Python 3.13+ when available.
For statically-linked Ruby binaries (bin/ruby rather than libruby.so), TLS descriptors are not available. Instead, rb_current_ec_noinline accesses the execution context directly via a TP-relative offset (FS:offset on x86_64, MRS tpidr_el0 + ADD on aarch64). This reuses the asm/amd.ExtractTLSOffset and asm/arm.ExtractTLSOffset infrastructure from the Python TLS PR (open-telemetry#1109) to disassemble rb_current_ec_noinline and extract the offset. Also changes current_ec_tpbase_tls_offset from u64 to s64 since static TLS offsets (local exec model) are negative on x86_64.
For statically-linked Ruby binaries (bin/ruby rather than libruby.so), TLS descriptors are not available. Instead, rb_current_ec_noinline accesses the execution context directly via a TP-relative offset (FS:offset on x86_64, MRS tpidr_el0 + ADD on aarch64). This reuses the asm/amd.ExtractTLSOffset and asm/arm.ExtractTLSOffset infrastructure from the Python TLS PR (open-telemetry#1109) to disassemble rb_current_ec_noinline and extract the offset. Also changes current_ec_tpbase_tls_offset from u64 to s64 since static TLS offsets (local exec model) are negative on x86_64.
For statically-linked Ruby binaries (bin/ruby rather than libruby.so), TLS descriptors are not available. Instead, rb_current_ec_noinline accesses the execution context directly via a TP-relative offset (FS:offset on x86_64, MRS tpidr_el0 + ADD on aarch64). This reuses the asm/amd.ExtractTLSOffset and asm/arm.ExtractTLSOffset infrastructure from the Python TLS PR (open-telemetry#1109) to disassemble rb_current_ec_noinline and extract the offset. Also changes current_ec_tpbase_tls_offset from u64 to s64 since static TLS offsets (local exec model) are negative on x86_64.
For statically-linked Ruby binaries (bin/ruby rather than libruby.so), TLS descriptors are not available. Instead, rb_current_ec_noinline accesses the execution context directly via a TP-relative offset (FS:offset on x86_64, MRS tpidr_el0 + ADD on aarch64). This reuses the asm/amd.ExtractTLSOffset and asm/arm.ExtractTLSOffset infrastructure from the Python TLS PR (open-telemetry#1109) to disassemble rb_current_ec_noinline and extract the offset. Also changes current_ec_tpbase_tls_offset from u64 to s64 since static TLS offsets (local exec model) are negative on x86_64. Also adds RUBY_DISABLE_GC env var support to the loop.rb test script to allow capturing coredumps without GC interference.
For statically-linked Ruby binaries (bin/ruby rather than libruby.so), TLS descriptors are not available. Instead, rb_current_ec_noinline accesses the execution context directly via a TP-relative offset (FS:offset on x86_64, MRS tpidr_el0 + ADD on aarch64). This reuses the asm/amd.ExtractTLSOffset and asm/arm.ExtractTLSOffset infrastructure from the Python TLS PR (open-telemetry#1109) to disassemble rb_current_ec_noinline and extract the offset. Also changes current_ec_tpbase_tls_offset from u64 to s64 since static TLS offsets (local exec model) are negative on x86_64. Also adds RUBY_DISABLE_GC env var support to the loop.rb test script to allow capturing coredumps without GC interference.
While looking into #1054 I noticed that python unwinding fails starting for python 3.13 on ARM64.
Starting from Python 3.13, internals changed with python/cpython#103323.