WIP: Do not merge: Test periodic reaper with the router#111
WIP: Do not merge: Test periodic reaper with the router#111mrunalp wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
|
||
| // parseProcForZombies parses the current procfs mounted at /proc | ||
| // to find proccess in the zombie state. | ||
| func parseProcForZombies() ([]int, error) { |
There was a problem hiding this comment.
Does the direct use of procfs couple this to Linux?
There was a problem hiding this comment.
We don't support containers on non linux right now - router wouldn't work on windows
|
Interesting... can you explain the theory of operation? My assumption based on prior R&D is that switching from signal handling to a polling approach shortens (but does not eliminate) the window of time the reaper can race with explicit |
| // from a pid 1 process. | ||
| // The zombie processes are reaped at the beginning of next cycle, so os.Exec calls | ||
| // have an oppurtunity to reap their children within `period` seconds. | ||
| func StartPeriodicReaper(period int64) { |
There was a problem hiding this comment.
Instead of forcing the caller to a certain time unit, how about instead receiving a time.Duration?
|
All of this seems like band aid to me. In calls where we lose/race when calling Wait() we're likely to log an error. Customers than report that, we say: "not really an error" which is perhaps "OK" for the case we have now. It just seems like this will mask problems. Why don't we have some form of real init where we won't race? See also: #78 |
|
@frobware agree, not sure if we need to re-litigate it here |
|
Mrunals change is fundamentally different. The old reaper was completely wrong - you don't wait for all children, you wait for the unmanaged defunct ones. And you only need to reap defunct processes. So this logic correct now - only zombie defunct non parented children get reaped. There's no race because a defunct process stays defunct |
|
Bringing in a new init system to manage defunct processes vs 20 lines of go that will be used in every image that performs the required function. There is no real init responsibility except reaping defunct processes. |
Ugh. I clearly didn't spend enough time looking at this because it's clearly waiting on zombie pids only. I did some validation of the change here https://github.com/frobware/haproxy-hacks/blob/master/reaper/main.go#L21, this based on our previous experiments. In the following output we see the router's reload script being invoked every second: And later we see the reaper picking up some zombies: And based on timing, we see the number of defunct drop: |
|
/lgtm |
| var zs []int | ||
| var err error | ||
| for { | ||
| for _, z := range zs { |
There was a problem hiding this comment.
Curious as to why the don't pull the call to:
zs, err = parseProcForZombies()before this loop?
|
|
||
| // StartPeriodicReaper starts a goroutine to reap processes periodically if called | ||
| // from a pid 1 process. | ||
| // The zombie processes are reaped at the beginning of next cycle, so os.Exec calls |
There was a problem hiding this comment.
This comment isn't necessary, since os.Exec doesn't care about zombies (it will never itself have a zombie because it never quit so the child isn't reparented)
|
Also testing in openshift/ci-search#73 |
| // from a pid 1 process. | ||
| // The zombie processes are reaped at the beginning of next cycle, so os.Exec calls | ||
| // have an oppurtunity to reap their children within `period` seconds. | ||
| func StartPeriodicReaper(period int64) { |
There was a problem hiding this comment.
Take a time.duration yeah
| // have an oppurtunity to reap their children within `period` seconds. | ||
| func StartPeriodicReaper(period int64) { | ||
| if os.Getpid() == 1 { | ||
| klog.V(4).Infof("Launching periodic reaper") |
There was a problem hiding this comment.
Once you've tested strip out all logging.
| @@ -3,13 +3,87 @@ | |||
| package proc | |||
There was a problem hiding this comment.
Rename this file proc_linux.go and then add a proc.go with // +build !linux that exposes StartReaper
pkg/cmd/infra/router/template.go
Outdated
| } | ||
|
|
||
| proc.StartReaper() | ||
| proc.StartPeriodicReaper(6) |
There was a problem hiding this comment.
Prefer StartReaper since the function hasn't changed really.
|
I will update the PR with the comments here at openshift/library-go#767 tomorrow. Thanks! |
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
4b02e46 to
d0207d8
Compare
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: frobware, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@mrunalp: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Closing in favor of #190 |
|
@sgreene570: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Mrunal Patel mrunalp@gmail.com