Skip to content

Conversation

@twz123
Copy link
Member

@twz123 twz123 commented Aug 20, 2025

Description

It's better to leave the decision what to do about a lingering process to the users. Try to terminate the process gracefully once and poll for process termination. If the process didn't terminate in time, bail out with an error.

Re-implement the Supervisor proc handle without the stdlib. The os/exec API is not a good fit for implementing the PID file cleanup. Instead, use a combination of dirfds to /proc/<pid> directories, along with pidfd_send_signal (if available) to implement a race-free process termination.

Don't misuse shouldKillProcess to check if a process has terminated or not. Rename it to isK0sManaged and add appropriate docs. Instead, introduce a purpose-built hasTerminated function.

Split up Linux and macOS process handling. There is no proc filesystem on macOS. Split things up and leave a minimal implementation in place to reflect this fact. This allows most of the unit tests to run on macOS while making it clear that proc stuff doesn't exist on macOS.

Also enable the bogus PID file test unconditionally. This is just about the PID file parsing, which doesn't require platform-specific APIs.

See:

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • Manual test
  • Auto test added

Checklist

  • My code follows the style guidelines of this project
  • My commit messages are signed-off
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

@twz123 twz123 added the enhancement New feature or request label Aug 20, 2025
@twz123 twz123 marked this pull request as ready for review August 20, 2025 17:45
@twz123 twz123 requested review from a team as code owners August 20, 2025 17:45
@twz123 twz123 requested review from kke and makhov August 20, 2025 17:45
@twz123 twz123 force-pushed the no-kill-on-pid-file-cleanup branch from c983be6 to 8d2dd2b Compare August 21, 2025 07:03
Copy link
Contributor

@juanluisvaladas juanluisvaladas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the first 5 commits in their current state but it's not clear to me why it's better to bail out instead of just terminating the process.

I can't think of a scenario where killing the process is a bad idea, I guess I'm missing something?

}

func isZombie(pid int) (bool, error) {
// https://man7.org/linux/man-pages/man5/proc_pid_stat.5.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe it's worth adding another comment with the relevant kernel source (I had to search it to understand the fields and all)

https://github.com/torvalds/linux/blob/v4.19/fs/proc/array.c#L152

I picked 4.19 so that it's not a very recent version and people reading it don't think it's a new feature but it doesn't really matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also worth noting that we already depend indirectly on github.com/prometheus/procfs, which handles this, but actually I prefer this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe it's worth adding another comment with the relevant kernel source (I had to search it to understand the fields and all)

TBH, I found the man pages to be very clear and approachable for this. They mention field names, their order, their sprintf format directive, and a comprehensive description. I tried to pick that up in the comments below: "pid" and "comm" are the first two fields, "state" is the third one. Maybe we can make those comments more clear?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already depend indirectly on github.com/prometheus/procfs

Yeah, I think there's quite a few indirect dependencies dealing with procfs. I think containerd might have something into that direction, too. But I think they're all coming from a different angle to the problem: Prometheus is about monitoring, i.e. gathering data in a bunch, containerd, again, might be more concerned about security aspects when querying procfs data.

@twz123
Copy link
Member Author

twz123 commented Aug 22, 2025

I'm fine with the first 5 commits in their current state but it's not clear to me why it's better to bail out instead of just terminating the process.

I can't think of a scenario where killing the process is a bad idea, I guess I'm missing something?

During review of #6203 @ncopa raised concerns about killing processes. He would prefer to leave that decision to the users. There's always the chance to break something when killing processes, e.g., say, etcd while it's doing stuff on disk.

if errors.Is(err, os.ErrProcessDone) {
return nil, syscall.ESRCH
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be removed. The process can disappear right after the test is performed and before the return of function, so we need to handle the case of missing process late. No need to handle it twice.

}

// Do a last TOCTOU check: We now know the state of the process with the
// referenced PID. We go back to os.Process, once again, to ensure that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't know the state of the process at this point. The state may have changed after we did the test(s).
For that reason I think the entire hasTerminated function should be removed.

@ncopa
Copy link
Collaborator

ncopa commented Aug 27, 2025

Don't misuse shouldKillProcess to check if a process has terminated or not. Rename it to isK0sManaged and add appropriate docs. Instead, introduce a purpose-built hasTerminated function.

This does not work. We can not rely on a hasTerminated function because the process state can change at any time. The process can disappear while we check if it is k0s managed. I also think that the isZombie is meaningless, as the process state can change after the test is done.

@ncopa
Copy link
Collaborator

ncopa commented Aug 27, 2025

The os package uses ErrProcessDone ("os: process already finished"), not syscall.ESRCH ("no such process").

Good catch! Seems to be since go 1.4.

@twz123 twz123 force-pushed the no-kill-on-pid-file-cleanup branch from 8d2dd2b to 24422a4 Compare September 19, 2025 16:12
@twz123 twz123 added this to the 1.34 milestone Sep 19, 2025
@twz123 twz123 force-pushed the no-kill-on-pid-file-cleanup branch 2 times, most recently from 3b370d3 to 6e5d7a0 Compare September 19, 2025 16:21
It models the /proc mount, particularly the /proc/<pid> directories and
their contents. It focuses on the requirements of the supervisor.

Signed-off-by: Tom Wieczorek <[email protected]>
The Dir newtype around os.File provides methods for opening, reading,
statting and listing files or directories relative to a dirfd file
descriptor. Provides robust handling of directory relative operations,
maintaining directory access consistency across rename/move operations,
potentially mitigating symbolic link attacks.

Signed-off-by: Tom Wieczorek <[email protected]>
The os package uses ErrProcessDone ("os: process already finished"), not
syscall.ESRCH ("no such process"). Reflect that usage in the procHandle
implementation. Make the openPID function actually check for the
existence of the process, and in that case, return an ESRCH.

This fixes a bug in which errors returned from procHandle's
requestGracefulShutdown and kill methods were checked against ESRCH,
where they should have been checked against ErrProcessDone.

Signed-off-by: Tom Wieczorek <[email protected]>
This is in preparation for disabling the code paths on macOS that rely
on Linux-only proc filesystem parts.

Signed-off-by: Tom Wieczorek <[email protected]>
There is no proc filesystem on macOS. Split things up and leave a
minimal implementation in place to reflect this fact. This allows most
of the unit tests to run on macOS while making it clear that proc stuff
doesn't exist on macOS.

Also enable the bogus PID file test unconditionally. This is just about
the PID file parsing, which doesn't require platform-specific APIs.

Signed-off-by: Tom Wieczorek <[email protected]>
Also adjust logging a bit.

Signed-off-by: Tom Wieczorek <[email protected]>
Don't misuse shouldKillProcess to check if a process has terminated
or not. Rename it to isK0sManaged and add appropriate docs. Instead,
introduce a purpose-built hasTerminated function.

This better reflects what this is about.

Signed-off-by: Tom Wieczorek <[email protected]>
This allows for using the pidfd_send_signal syscall.

Signed-off-by: Tom Wieczorek <[email protected]>
It's better to leave the decision what to do about a lingering process
to the users. Try to terminate the process gracefully once and poll for
process termination. If the process didn't terminate in time, bail out
with an error.

Re-implement the Supervisor proc handle without the stdlib. The os/exec
API is not a good fit for implementing the PID file cleanup. Instead,
use a combination of dirfds to /proc/<pid> directories, along with
pidfd_send_signal (if available) to implement a race-free process
termination.

Signed-off-by: Tom Wieczorek <[email protected]>
@twz123 twz123 force-pushed the no-kill-on-pid-file-cleanup branch from 6e5d7a0 to 4b1861c Compare September 19, 2025 16:44
Copy link
Collaborator

@ncopa ncopa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@twz123 twz123 merged commit 5e87c30 into k0sproject:main Sep 22, 2025
189 of 190 checks passed
@twz123 twz123 deleted the no-kill-on-pid-file-cleanup branch September 22, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants