Fix off-by-one error in stack delta lookup#1027
Fix off-by-one error in stack delta lookup#1027umanwizard wants to merge 1 commit intoopen-telemetry:mainfrom
Conversation
When the current address is a return address, we need to subtract one to get the real call instruction. Anecdotally, fixing this bug seems to substantially improve unwinding for Rust aarch64 binaries compiled in debug mode. The reason for this is that our heuristic for getting the frame pointer doesn't work in function epilogues, and in debug builds, you often have a call instruction immediately followed by the function epilogue. So due to this off-by-one error, we'll think we're in the epilogue and not correctly unwind the fp. It seems much less common in release builds, probably because it is much more unusual to have a call immediately followed by the function epilogue (because the optimizer will replace that sequence by a tail call).
|
@christos68k could you help me upload the test case to the s3 bucket for the coredump test? Tommy told me you helped him do it in the past. |
|
Hmmm... It seems to break some unrelated coredump tests. Looking into it. |
|
Several things for you to note:
You may want to read Ideally, though the conditional Depending on things this could be rust |
Hi @umanwizard, I'm currently traveling and will be on PTO until January 3rd. When you're ready with the test case, ping us here again and if I'm still out either @fabled or @florianl can help with the upload. |
|
@fabled On further thought, you are right. I misunderstood the meaning of the address here. It does superficially appear to fix the issue, but that must be for a different reason. I suppose I need to investigate further ... |
|
Actually, sorry for the confusion. I think I was originally correct that the current code is wrong and a I have this Rust code: fn g(n: usize) -> usize {
unsafe {
std::arch::asm!("brk #0");
}
n
}
fn f(n: usize) -> usize {
g(n)
}
fn main() {
let x = f(40);
println!("{x}");
}which, when compiled in debug mode, results in this disassembly (irrelevant stuff snipped): So, when we hit the breakpoint instruction in
The eh_frame instructions corresponding to Note that a new section begins at 0x12f84. -- this section assumes that the instruction at 0x12f84, which clobbers So the semantics of a section beginning at address "foo" in eh_frame seems to be that the instruction at foo has already executed, not that it's about to execute (which is what I assumed in the past). Just to confirm that this is right, I checked what gcc does. Gcc also subtracts one from the pc before attempting to look up DWARF/eh_frame data and unwind. See here for the relevant line of the GCC source code. The reason this current PR is wrong is that we should not do the adjustment for signal return frames; we should only do it for normal frames (again see the code in GDB above). But we don't detect that we are currently in such a frame (and appropriately set |
I think this is not true. Can you show non-Rust code that does this, or the DWARF specification saying so? The DWARF version 4 figure 63 & 64 reads otherwise to me (at this late hour). Also if it was "has executed" it would make it impossible to unwind the first instruction of a function from signal/async context. Also, the entry eh_frame you show is:
So this prologue portion seems conflicting to the epilogue portion. If what you say is true, the I think Rust has .eh_frame generation bug. Or what do you think? I am not sure for which the GDB function you refer are used, but there are scenarios when the
We have also different interruption points than GDB. The kernel perf entry can happen signal like. So what we do have additional state. The But just alone on the assembly snippet and the corresponding .eh_frame; it looks conflicting. And I think there is code/.eh_frame generation issue somewhere. Thoughts? |
|
The eh_frame dump interleaved with assembly:
Suggests that the the address corresponds to the state before the instruction execution. Except for 0000000000012f84 where its off by one. File issue to Rust? If we need to workaround this in our code, we should have heuristic to determine that its Rust. And then have code to detect if the bug is manifest (always or per instruction basis), and fixup the address on stack delta generation stage. |
|
The issue isn't Rust-specific; I just used Rust arbitrarily. This C program, when compiled with clang, gives very similar results: #include <stdio.h>
size_t g(size_t n) {
__asm__("brk #0");
return n;
}
size_t f(size_t n) {
return g(n);
}
int main() {
size_t x = f(40);
printf("%zu\n", x);
return 0;
}So I think it's an issue for all LLVM-based compilers, not just rustc. Now, there are a couple things to note here:
// Try to resolve frame pointer
// simple heuristic for FP based frames
// the GCC compiler usually generates stack frame records in such a way,
// so that FP/RA pair is at the bottom of a stack frame (stack frame
// record at lower addresses is followed by stack vars at higher ones)
// this implies that if no other changes are applied to the stack such
// as alloca(), following the prolog SP/FP points to the frame record
// itself, in such a case FP offset will be equal to 8
if (info->fpParam == 8) {
// we can assume the presence of frame pointers
if (info->fpOpcode != UNWIND_OPCODE_BASE_LR) {
// FP precedes the RA on the stack (Aarch64 ABI requirement)
bpf_probe_read_user(&state->fp, sizeof(state->fp), (void *)(ra - 8));
}
}This
In gdb 16.3, Dwarf unwinding code is executed with this stack trace: This makes it clear that CORE_ADDR block_addr = get_frame_address_in_block (this_frame);
struct dwarf2_fde *fde = dwarf2_frame_find_fde (&block_addr, NULL);and So, I'm really not sure what's going on -- is GCC wrong, but it doesn't matter because this logic is only executed for leaf frames? Perhaps the correct fix is to improve our "heuristic" for finding the frame pointer on aarch64. |
I think the above two are correct. And completely different from what you've trying to proof so far (we lookup wrong unwind opcode). I think the failure is indeed arm64 specific issue that the .eh_frame entry: is not unwound correctly. I think the heuristic is just wrong. We should have new opcode to unwind both FP+RA and have the stack delta generator recognize if the FP+RA are defined and consecutive in memory. This would get rid of the heuristic and allow this to work. Can you close this PR, and create a new issue or PR with the above approach? |
Note that this also depends on to what kind of lookup function it is passed. Did you compare that the compare function is same as in our binary search? |
Yes. Just to be clear, I'm not arguing for that anymore. I'm now convinced that my initial analysis was wrong. |
When the current address is a return address, we need to subtract one to get the real call instruction.
Anecdotally, fixing this bug seems to substantially improve unwinding for Rust aarch64 binaries compiled in debug mode. The reason for this is that our heuristic for getting the frame pointer doesn't work in function epilogues, and in debug builds, you often have a call instruction immediately followed by the function epilogue. So due to this off-by-one error, we'll think we're in the epilogue and not correctly unwind the fp.
It seems much less common in release builds, probably because it is much more unusual to have a call immediately followed by the function epilogue (because the optimizer will replace that sequence by a tail call).