Skip to content

reaper: Add a periodic reaper funcion#767

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
mrunalp:add_periodic_reaper
Jun 8, 2020
Merged

reaper: Add a periodic reaper funcion#767
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
mrunalp:add_periodic_reaper

Conversation

@mrunalp
Copy link
Member

@mrunalp mrunalp commented Apr 6, 2020

This commit adds a periodic reaper function that is intended
to be used when the go process uses the os/exec package to
launch child processes. The period argument could be adjusted
according to how long the os/exec calling code waits to
gather exit status from child processes.

Signed-off-by: Mrunal Patel mrunalp@gmail.com

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 6, 2020
@mrunalp
Copy link
Member Author

mrunalp commented Apr 6, 2020

@smarterclayton ptal. If we can confirm that this works well with router, I can optimize the code further for parsing /proc. Also, we can consolidate both types of reaping with ReaperOptions and a single function or keep it separate like the current approach.

@smarterclayton
Copy link
Contributor

/hold

Can you test vendor this into a router PR and look at the log output? If this fixes it, we should get none of the "can't wait for process anymore"

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 6, 2020
@mrunalp
Copy link
Member Author

mrunalp commented Apr 6, 2020

@smarterclayton sure, I can do that.

@mrunalp mrunalp force-pushed the add_periodic_reaper branch from de978fb to 8682157 Compare April 6, 2020 23:34
@mrunalp
Copy link
Member Author

mrunalp commented Apr 7, 2020

Opened openshift/router#111

@mrunalp
Copy link
Member Author

mrunalp commented Apr 23, 2020

Updated.

@mrunalp
Copy link
Member Author

mrunalp commented Apr 23, 2020

I have a couple of logs remaining that I will take out after one more round of testing on the router PR.

@mrunalp mrunalp force-pushed the add_periodic_reaper branch from e10c451 to dda569a Compare May 4, 2020 18:31
@mrunalp mrunalp changed the title WIP: reaper: Add a periodic reaper funcion reaper: Add a periodic reaper funcion May 4, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 4, 2020
@mrunalp
Copy link
Member Author

mrunalp commented May 4, 2020

Updated the PR. This is ready.

for {
zs, err = parseProcForZombies()
if err != nil {
klog.V(4).Infof(err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give this an error prefix so we know it was from reaping


// StartReaper starts a goroutine to reap processes periodically if called
// from a pid 1 process.
func StartReaper(period time.Duration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If period is less than or equal to zero set it to five seconds and add a comment above. Also describe why you might change the default period (if you create tens of thousands of defunct processes every five seconds).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@mrunalp mrunalp force-pushed the add_periodic_reaper branch 2 times, most recently from eb5a6cb to 9989b9e Compare May 4, 2020 23:19
@mrunalp
Copy link
Member Author

mrunalp commented Jun 3, 2020

@smarterclayton could you take another look?


// StartReaper starts a goroutine to reap processes periodically if called
// from a pid 1 process.
// If period is less than 5 seconds, then it is set to 5 seconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have it be only when 0, not when less than five. If you have a very fast subprocess creating app, you might need to set it faster.

func StartReaper(period time.Duration) {
if os.Getpid() == 1 {
const minReaperPeriodSeconds = 5
if period.Seconds() < minReaperPeriodSeconds {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if period == 0 { period = 5 * time.Second }`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good! Default to 5.

This commit modifies the reaper behavior to periodically scan
the procfs for zombies and then reap them in the next cycle.
This should work better for container processes that spawn
child processes through os.Exec

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
@mrunalp mrunalp force-pushed the add_periodic_reaper branch from 9989b9e to fe841e6 Compare June 3, 2020 22:10
@smarterclayton
Copy link
Contributor

/lgtm

@smarterclayton
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2020
@smarterclayton smarterclayton added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrunalp, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit c10453a into openshift:master Jun 8, 2020
bertinatto pushed a commit to bertinatto/library-go that referenced this pull request Jul 2, 2020
sgreene570 added a commit to sgreene570/router that referenced this pull request Sep 21, 2020
Switch the router process reaper to use
the new periodic reaper brought in by
openshift/library-go#767
sgreene570 added a commit to sgreene570/router that referenced this pull request Sep 30, 2020
Switch the router process reaper to use
the new periodic reaper brought in by
openshift/library-go#767
sgreene570 added a commit to sgreene570/router that referenced this pull request Oct 6, 2020
Bump openshift/library-go dep to include
the new periodic reaper function merged in
openshift/library-go#767
sgreene570 added a commit to sgreene570/router that referenced this pull request Oct 6, 2020
Bump openshift/library-go dep to include
the new periodic reaper function merged in
openshift/library-go#767
sgreene570 added a commit to sgreene570/router that referenced this pull request Oct 6, 2020
Switch the router process reaper to use
the new periodic reaper brought in by
openshift/library-go#767
sgreene570 added a commit to sgreene570/router that referenced this pull request Oct 6, 2020
Bump openshift/library-go dep to include
the new periodic reaper function merged in
openshift/library-go#767
sgreene570 added a commit to sgreene570/router that referenced this pull request Oct 6, 2020
Bump openshift/library-go dep to include
the new periodic reaper function merged in
openshift/library-go#767
sgreene570 added a commit to sgreene570/router that referenced this pull request Oct 6, 2020
Switch the router process reaper to use
the new periodic reaper brought in by
openshift/library-go#767
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants