Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic on underflowing subtraction during recording #26

Open
pobrn opened this issue Feb 15, 2023 · 7 comments
Open

panic on underflowing subtraction during recording #26

pobrn opened this issue Feb 15, 2023 · 7 comments

Comments

@pobrn
Copy link

pobrn commented Feb 15, 2023

  • Version: 3b5f5e1 (built simply with cargo build)
  • Platform: Linux 6.1.11

Command that triggers the issue:

samply record gdb --version

gdb is from the Arch Linux repositories (version: 12.1-2).


thread '<unnamed>' panicked at 'attempt to subtract with overflow', samply/src/linux_shared/context_switch.rs:... (multiple places)

One particular panic:

Thread 2 "samply" hit Breakpoint 3, samply::linux_shared::context_switch::ContextSwitchHandler::handle_sample (self=0x7ffff7c8c2f8, timestamp=62712187706648, thread=0x7ffff0d96338)
    at samply/src/linux_shared/context_switch.rs:141
141	        let off_cpu_sample = match thread.state {
$605 = 62712187706648
$606 = samply::linux_shared::context_switch::ThreadContextSwitchData {
  state: samply::linux_shared::context_switch::ThreadState::On{last_observed_on_timestamp: 62712191834216},
  on_cpu_duration_since_last_sample: 0,
  off_cpu_duration_since_last_off_cpu_sample: 191576
}
thread '<unnamed>' panicked at 'attempt to subtract with overflow', samply/src/linux_shared/context_switch.rs:147:35

Instrumenting all ContextSwitchHandler::handle_{sample,context_in,context_out} calls gives the following:
call-log.txt

@ishitatsuyuki
Copy link
Contributor

Note that usually samply is built and used in release mode, to avoid debug slowdown that leads to missed samples. In release mode, integer overflows are silently replaced with wrapping arithmetic, and most of the overflows in samply ends up being harmless.

@mstange
Copy link
Owner

mstange commented Feb 15, 2023

The unordered timestamps are probably due to the code around here, which gets records from the perf events of multiple CPUs and doesn't sort them:

self.event_buffer.extend(perf.iter());

It would be great if somebody could read up on what the guarantees are about the ordering of records from perf events from multiple CPUs. Is sorting just "the current batch" enough, or do new records from the current batch sometimes need to be ordered before records from the previous batch? I think the latter is the case, and I also needed to deal with it when reading records from a perf.data file. The code for the batch reordering is here: https://github.com/mstange/linux-perf-data/blob/f0c5b59781303eaf1c7e6aa523e711a4018f5c09/src/sorter.rs
However, if reordering across batches is necessary, it would be good to come up with a solution which doesn't require copying the pending unsorted records to the heap.

@ishitatsuyuki
Copy link
Contributor

It looks like the reordering is also causing other troubles, for example, fork events can come before the execve event:

[samply/src/linux/profiler.rs:404] e = ForkOrExitRecord {
    pid: 1708112,
    ppid: 1708112,
    tid: 1708115,
    ptid: 1708112,
    timestamp: 38419594592932,
}
[samply/src/linux/profiler.rs:407] e = {
    "pid": 1708112,
    "tid": 1708112,
    "name": "fossilize-repla",
    "is_execve": true,
}

Which mess up memory map tracking and cause symbolization to fail.

Not sure what to do with reordering though, it does seem to be a legitimately hard and annoying problem.

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Feb 22, 2023

It would be great if somebody could read up on what the guarantees are about the ordering of records from perf events from multiple CPUs. Is sorting just "the current batch" enough, or do new records from the current batch sometimes need to be ordered before records from the previous batch? I think the latter is the case, and I also needed to deal with it when reading records from a perf.data file. The code for the batch reordering is here: https://github.com/mstange/linux-perf-data/blob/f0c5b59781303eaf1c7e6aa523e711a4018f5c09/src/sorter.rs

I think it comes down to something similar to the "rounds" logic in linux-perf-data. There's no "guarantee" from perf, but for causally related events (e.g. fork → exec) it's possible to reason about conditions under the assumption that events are written synchronously wrt the syscall. When event B happens-after event A, and say we just read event B in one of the buffers, then acquire-release ordering guarantees we will be able to see (or already have seen) event A in one of the buffers as well.

Hence, if we read in the order of (CPU 0, Round x), (CPU 1, Round x), ..., (CPU N, Round x), (CPU 0, Round x+1), then we can commit (CPU 0, Round x) as soon as we have examined records up to (CPU N, Round x) for re-sorting. This is a slightly tighter bound compared to the "rounds" formulation.

This doesn't however guarantee anything about events that are not causally related to each other. In such case, event numbering can sometimes happen in different order than the samples write finishes, and timestamps have other reasons to drift (TSC synchronization issues can cause drift by a small amount). But those should never lead to logic errors and can be safely ignored.

Right now, a blocker for live event reordering is that we only read a limited number of events per EventIter call:

fn new(perf: &'a mut Perf) -> Self {
let mut locations = Vec::with_capacity(32);
{
let state = Arc::get_mut(&mut perf.event_ref_state)
.expect("Perf::iter called while the previous iterator hasn't finished processing");
let state = state.get_mut();
for _ in 0..31 {
state.positions[locations.len()] = perf.position;
let raw_event_location =
match next_raw_event(perf.buffer, perf.size, &mut perf.position) {
Some(location) => location,
None => break,
};
locations.push(raw_event_location);
}
state.positions[locations.len()] = perf.position;
state.done = !0;
}

To exploit the causal relationship, all available events need to be pushed into the buffer at once. This will likely require some refactor. The bitmask tracking is the tricky part, so if we at least made the reordering part consume events in-order it will likely simplify tracking logic required for this.

@mstange
Copy link
Owner

mstange commented Feb 22, 2023

Thank you so much for investigating this. Please feel free to do whatever refactoring is needed, I am not attached to this code at all. I imported it from https://github.com/koute/not-perf/ just to get a basic working version of samply running - this code has already survived for longer than I expected.

@ishitatsuyuki
Copy link
Contributor

I've made some preliminary changes at #29 and #30. I have a binary heap based sorter at ishitatsuyuki@664e1ae, but I then realized that the sorter from linux-perf-data is also kind of usable (except that it releases buffers by round granularity instead of stream granularity).

So first, here's the question of which sorter to use. The pros of binary heap sorter is that it's more granular, but the con is that binary heap falls short when it comes to absolute performance (quicksort/pdqsort is extremely well optimized).

Then there's the question of where the sorter should reside in. Right now, the sorter in linux-perf-data is private, so if we want to use it in samply we'll need to make it public. Or we could put the sorter in samply, making it independent from linux-perf-data.

@mstange
Copy link
Owner

mstange commented Feb 27, 2023

Minimizing the number of lost perf events is more important than minimizing the CPU overhead of the sampler. So if the more granular releasing of the binary heap approach lets us advance the read position in data_tail earlier and allows the kernel to write more records into the freed-up space, then I think it's preferable over the per-round sorting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants