Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zombie processes by calling Wait after Command.Run #280

Closed
wants to merge 1 commit into from

Conversation

mckelvin
Copy link

@mckelvin mckelvin commented Mar 8, 2017

There're a lot of zombies created by semaphore(marked with Zs). I think we need to call wait after every command is finished.

$ ps aux | grep ansible
ansible   8004  0.0  0.0      0     0 ?        Zs   13:53   0:00 [ssh] <defunct>
ansible   8005  0.0  0.0      0     0 ?        Zs   13:53   0:00 [ssh] <defunct>
ansible   9833  0.0  0.0      0     0 ?        Zs   13:57   0:00 [ssh] <defunct>
ansible   9834  0.0  0.0      0     0 ?        Zs   13:57   0:00 [ssh] <defunct>
ansible  12939  0.0  0.0      0     0 ?        Zs   14:13   0:00 [ssh] <defunct>
ansible  12940  0.0  0.0      0     0 ?        Zs   14:13   0:00 [ssh] <defunct>
ansible  14075  0.0  0.0      0     0 ?        Zs   14:16   0:00 [ssh] <defunct>
ansible  14076  0.0  0.0      0     0 ?        Zs   14:16   0:00 [ssh] <defunct>
ansible  14522  0.0  0.0      0     0 ?        Zs   14:17   0:00 [ssh] <defunct>
ansible  14523  0.0  0.0      0     0 ?        Zs   14:17   0:00 [ssh] <defunct>
ansible  15799  0.0  0.0      0     0 ?        Zs   14:20   0:00 [ssh] <defunct>
ansible  15800  0.0  0.0      0     0 ?        Zs   14:20   0:00 [ssh] <defunct>
ansible  15874  0.0  0.0      0     0 ?        Zs   11:41   0:00 [ssh] <defunct>
ansible  15875  0.0  0.0      0     0 ?        Zs   11:41   0:00 [ssh] <defunct>
ansible  17503  0.0  0.0      0     0 ?        Zs   11:45   0:00 [ssh] <defunct>
ansible  17504  0.0  0.0      0     0 ?        Zs   11:45   0:00 [ssh] <defunct>
ansible  24142  0.0  0.2  20968 11076 ?        Ssl  Mar07   0:03 semaphore -config /srv/semaphore/semaphore_config.json
ansible  24165  0.0  0.0      0     0 ?        Z    Mar07   0:00 [semaphore] <defunct>
ansible  24179  0.0  0.0  12836     0 ?        Sl   Mar07   0:00 /usr/bin/semaphore -config /srv/semaphore/semaphore_config.json
ansible  29176  0.0  0.0      0     0 ?        Zs   15:32   0:00 [ssh] <defunct>
ansible  29177  0.0  0.0      0     0 ?        Zs   15:32   0:00 [ssh] <defunct>
ansible  29836  0.0  0.0      0     0 ?        Zs   15:34   0:00 [ssh] <defunct>
ansible  29837  0.0  0.0      0     0 ?        Zs   15:34   0:00 [ssh] <defunct>

REF: http://stackoverflow.com/questions/36050503/golang-child-processes-become-zombies

@mckelvin mckelvin force-pushed the fix-zombie-process branch from 7ab8ce0 to 9cb2ed1 Compare March 8, 2017 09:14
@mckelvin
Copy link
Author

mckelvin commented Mar 8, 2017

Sorry but I just notice that the cmd.Wait() is already called and returned inside cmd.Run(), so there's no need to call wait again.

   // Run starts the specified command and waits for it to complete.
   268	//
   269	// The returned error is nil if the command runs, has no problems
   270	// copying stdin, stdout, and stderr, and exits with a zero exit
   271	// status.
   272	//
   273	// If the command fails to run or doesn't complete successfully, the
   274	// error is of type *ExitError. Other error types may be
   275	// returned for I/O problems.
   276	func (c *Cmd) Run() error {
   277		if err := c.Start(); err != nil {
   278			return err
   279		}
   280		return c.Wait()
   281	}

@mckelvin mckelvin closed this Mar 8, 2017
@mckelvin mckelvin deleted the fix-zombie-process branch March 8, 2017 11:19
@mckelvin
Copy link
Author

mckelvin commented Mar 8, 2017

Solve the problem finally and it has nothing to do with ansible-semaphore. Here is how it happened:

I'm using semaphore inside docker. I build the docker image on my own but the basic idea is the same with the official one:

FROM alpine

RUN apk add --no-cache git ansible mysql-client curl openssh-client
RUN curl -L https://github.com/ansible-semaphore/semaphore/releases/download/v2.2.0/semaphore_linux_amd64 > /usr/bin/semaphore && chmod +x /usr/bin/semaphore && mkdir -p /etc/semaphore/playbooks

ADD semaphore-startup.sh /usr/bin/semaphore-startup.sh
RUN chmod +x /usr/bin/semaphore-startup.sh

EXPOSE 3000
ENTRYPOINT ["/usr/bin/semaphore-startup.sh"]

CMD ["/usr/bin/semaphore", "-config", "/etc/semaphore/semaphore_config.json"]

The problem is: the semaphore will start with pid:1 but, process with pid:1 should have the responsibility to clean up zombie processes who lost their parents.

ansible@semaphore:/$ ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ansible    313  0.0  0.0  21968  3688 ?        Ss   19:07   0:00 bash
ansible    319  0.0  0.0  19184  2388 ?        R+   19:07   0:00  \_ ps auxf
ansible      1  0.0  0.3  20732 13876 ?        Ssl  16:37   0:00 semaphore -config /srv/semaphore/semaphore_config.json
ansible      9  0.0  0.0      0     0 ?        Z    16:37   0:00 [semaphore] <defunct>
ansible     18  0.0  0.2  15064  8420 ?        Sl   16:37   0:00 /usr/bin/semaphore -config /srv/semaphore/semaphore_config.json
ansible     50  0.0  0.0      0     0 ?        Z    18:30   0:00 [ssh] <defunct>
ansible    113  0.0  0.0      0     0 ?        Zs   18:37   0:00 [ssh] <defunct>
ansible    114  0.0  0.0      0     0 ?        Zs   18:37   0:00 [ssh] <defunct>
ansible    152  0.0  0.0      0     0 ?        Zs   18:37   0:00 [ssh] <defunct>
ansible    153  0.0  0.0      0     0 ?        Zs   18:37   0:00 [ssh] <defunct>
ansible    214  0.0  0.0      0     0 ?        Zs   18:38   0:00 [ssh] <defunct>
ansible    215  0.0  0.0      0     0 ?        Zs   18:38   0:00 [ssh] <defunct>
ansible    252  0.0  0.0      0     0 ?        Zs   18:38   0:00 [ssh] <defunct>
ansible    253  0.0  0.0      0     0 ?        Zs   18:38   0:00 [ssh] <defunct>

So I end up using tini inside the semaphore container.

ansible@semaphore:/$ ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ansible     28  0.0  0.0  21972  3588 ?        Ss   19:14   0:00 bash
ansible     34  0.0  0.0  19184  2336 ?        R+   19:14   0:00  \_ ps auxf
ansible      1  0.0  0.0   4224   620 ?        Ss   19:13   0:00 /tini -- /usr/bin/semaphore-entrypoint.sh semaphore -config /srv/semaphore/semaphore_config.j
ansible      6  0.2  0.3  19676 12916 ?        Sl   19:13   0:00 semaphore -config /srv/semaphore/semaphore_config.json
ansible     20  0.0  0.1  13656  7752 ?        Sl   19:13   0:00  \_ /usr/bin/semaphore -config /srv/semaphore/semaphore_config.json

@matejkramny
Copy link
Contributor

Interesting, shouldn't this be done by docker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants