Skip to content

Handle processes whose main thread has exited#376

Merged
christos68k merged 16 commits intomainfrom
ck/process-exit
Apr 25, 2025
Merged

Handle processes whose main thread has exited#376
christos68k merged 16 commits intomainfrom
ck/process-exit

Conversation

@christos68k
Copy link
Copy Markdown
Member

@christos68k christos68k commented Feb 27, 2025

Summary

This PR implements both steps described in #365 (comment).

Thanks to @korniltsev for suggesting disassociate_ctty, I ended up using another tracepoint sched_process_free instead as it makes fewer assumptions and is more stable (see this comment for more context). It also allows us to simplify cleanup logic (no need for the extra periodic cleanups I had in the first prototype solution), as userspace will get a final PID notification when the process gets freed by the kernel.

Essentially, whenever the main thread exits, we do not unload process information thus allowing profiling the remaining threads to continue. Processmanager can also track mapping changes triggered by one of the remaining threads.

I added some debug warning statements to ease review, I will remove the commit that introduced them before merging. I also added a C program that you can compile and run as a testing workload with the profiling agent also running, that should exercise all the corner cases that this PR addresses. Looking at the warning logs I added and the generated flamegraph in devfiler should make the timeline of processmanager operations very clear.

It's probably easier to review this commit-by-commit.

TODO:

  • DONE Add test program
  • More testing

Comment thread processmanager/processinfo.go
Comment thread processmanager/processinfo.go Outdated
@christos68k christos68k marked this pull request as draft February 27, 2025 22:07
Comment thread processmanager/processinfo.go Outdated
@korniltsev
Copy link
Copy Markdown
Contributor

Thanks for looking into this.

This looks OK overall and should solve the issue from the user perspective.

One of the downsides I see is that while we do not unload the old mappings, we re also not loading new mappings, which may degrade profiling of such processes ( I am still not sure if there are legit applications with dead main thread, or is it a highly infrequent corner case)

I personally would prefer if the processmanager "re-elected" a main thread by looking into the process threads, although I realize it may require more work and we may do this later.

Another thing to consider is to hook a kprobe on disassociate_ctty which is called when the process group is dead
https://github.com/torvalds/linux/blame/master/kernel/exit.c#L935-L936 this may help avoiding a separate timer for this case.

It would be nice to have a unit test for this case regardless of the solution we chose.

@christos68k
Copy link
Copy Markdown
Member Author

christos68k commented Mar 4, 2025

One of the downsides I see is that while we do not unload the old mappings, we re also not loading new mappings, which may degrade profiling of such processes ( I am still not sure if there are legit applications with dead main thread, or is it a highly infrequent corner case)
I personally would prefer if the processmanager "re-elected" a main thread by looking into the process threads, although I realize it may require more work and we may do this later.

I'm currently working on this, will push new commits (implementing part 2 of the proposed solution in #365) today.

Another thing to consider is to hook a kprobe on disassociate_ctty which is called when the process group is dead https://github.com/torvalds/linux/blame/master/kernel/exit.c#L935-L936 this may help avoiding a separate timer for this case.

I think we can switch to sched_process_free tracepoint (instead of sched_process_exit) which should be more performant than a kprobe. I'll verify.

EDIT: sched_process_free fires for every kernel task so it's not suitable if we want to avoid notifying userspace of every thread exit. On the other hand, disassociate_ctty seemingly (no in-depth investigation done on my part) does what we want and also seemingly executes after task has been removed from /proc which eliminates a possible race in userspace that would otherwise be a (probably unlikely) concern.

EDIT2: Went back to sched_process_free which we can make work by checking whether PID is something we track or not.

Comment thread support/ebpf/process_monitor.ebpf.c Outdated
@christos68k christos68k force-pushed the ck/process-exit branch 3 times, most recently from e11a0dc to 87e351e Compare March 20, 2025 18:46
@christos68k christos68k marked this pull request as ready for review March 20, 2025 18:53
Comment thread process/process.go
} else if path != "" {
// Ignore [vsyscall] and similar executable kernel
// pages we don't care about
} else {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No semantic change, I just inlined the logic from GetMappings here as this is the more appropriate place.

Comment thread processmanager/processinfo.go Outdated
func (pm *ProcessManager) processPIDExit(pid libpf.PID) {
exitKTime := times.GetKTime()
log.Debugf("- PID: %v", pid)
log.Warnf("- PID: %v", pid)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove these newly added warnings before merging, they should help with reviewing the PR as you don't need to run the agent with debug logs enabled and sort through a lot of irrelevant noise.

}
return
}
if len(mappings) == 0 {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are no longer relevant.

@christos68k
Copy link
Copy Markdown
Member Author

christos68k commented Mar 20, 2025

I added some more information and notes on how to review/test to the description.

@korniltsev please take another look and review/test.

@korniltsev
Copy link
Copy Markdown
Contributor

Great job. Thank you for looking into this.
I like the trick with sched_process_free and that we have no extra timers in userspace and the logic of the PM did not complicate.
I've run both my repro and your repro with libcrypto and the profiler works as expected. It keeps profiling remaining threads including new libraries (libcrypto)
I wish we could somehow create an integration test for it from the repro you've added so that it is run with every testruns instead of hoping I don't forget to run it. But I understand writing a test may be hard / time consuming so we may do this later.
LGTM

Comment thread support/tests/main_thread_exit.c
Comment thread support/ebpf/sched_monitor.ebpf.c
Comment thread tracer/tracer.go Outdated
Comment thread process/process.go

// New returns an object with Process interface accessing it
func New(pid libpf.PID) Process {
func New(pid, tid libpf.PID) Process {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't switch Process to accept libpf.PIDTID as the latter is only used with PID events, and I'd rather not couple it here too.

Copy link
Copy Markdown
Member

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the log.Warn(..) messages as mentioned in #376 (comment) before merging.

@christos68k
Copy link
Copy Markdown
Member Author

I rebased this PR on top of current main.

Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks pretty good already. I added few questions and comments. But I'll pre-approve this already so we can go forward.

Comment thread process/process.go Outdated
Comment on lines +243 to +262
// Test for main thread exit by checking for Zombie state
pidStat, err := os.ReadFile(fmt.Sprintf("/proc/%d/stat", sp.pid))
if err != nil {
// Should never happen while process is alive
return nil, 0, err
}

var p int
var c string
var state rune
n, err := fmt.Sscanf(string(pidStat), "%d %s %c", &p, &c, &state)
if err != nil || n < 3 {
// Should never happen
return nil, 0, err
}
sp.fileToMapping = fileToMapping
if state != 'Z' {
return mappings, numParseErrors, ErrNoMappings
}

log.Warnf("PID: %v main thread exit", sp.pid)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? I think we can just remove the zombie check.

If the maps file exists, it means process is running.

If the maps file is empty, it means the main thread has exited. There is no other condition that the main maps is empty, because the main thread cannot be executing code if there are no mappings available. This is purely a side effect of kernel having released the main thread specific resources.

Based on the two above things we can determine if: the process exited (since ebpf sent the event), or if the main thread has exited.

Or are you aware of some condition where this makes a difference? I think it was if all mappings entries resulted in parsing error? But I believe this should be handled as an error earlier. The reason is that reading the TID specific maps should be identical to the PID specific as memory mappings are shared between all threads.

Perhaps the only check to do here is if pid == tid then return early with ErrNoMappings.

Copy link
Copy Markdown
Member Author

@christos68k christos68k Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the zombie check (we'd need it if we go back to walking /proc but we don't need it now).

Comment thread process/process.go
continue
}
fileToMapping[m.Path] = m
if err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should also return early if err is nil and numParseErrors is non-zero. Or perhaps even better, parseMappings could return an err if it failed to find usable mappings (but it managed to read data). The idea is basically to distinguish here if mappings is empty or all lines were non-parseable.

Copy link
Copy Markdown
Member Author

@christos68k christos68k Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the len(mappings) == 0 check that follows cover this case?

If err == nil and len(mappings) != 0 then we simply continue and process the mappings. If err == nil and len(mappings) == 0 then we continue and try mappings from another thread. Essentially all branching logic depends on err and len(mappings), not numParseErrors which is purely advisory.

Comment thread process/process.go Outdated
Comment on lines +278 to +280
numParseErrorsAlt := uint32(0)
mappings, numParseErrorsAlt, err = parseMappings(mapsFileAlt)
numParseErrors += numParseErrorsAlt
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just overwrite the numParseErrors instead of adding them. It is only ever used for counters. And since the per TID and per PID maps should be identical, you are basically reporting doubled errors counter in this case.

sched_process_free is called when the task is freed by the kernel,
which allows for simpler cleanup of processes whose main thread
has exited.
Making TID available to processmanager allows the agent to keep
profiling a process whose main thread calls pthread_exit while
other threads continue to run.
This allows the agent to continue profiling a process whose main
thread has exited, but other threads continue to run. Mapping changes
triggered by one of the remaining threads are also tracked.
The latter is OS-agnostic, but the agent only runs on Linux.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Profiler incorrectly handles process exit when non-main threads are still running

4 participants