Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead pidfile on wheezy #808

Closed
jpic opened this issue Jun 13, 2016 · 13 comments
Closed

Dead pidfile on wheezy #808

jpic opened this issue Jun 13, 2016 · 13 comments
Labels

Comments

@jpic
Copy link
Contributor

jpic commented Jun 13, 2016

Currently, if the service is terminated with SIGKILL then the pidfile will remain on the file system, containing a dead PID. This prevents the service file to work correctly. To work around this, we added a logic like this in our deployment playbooks:

if [ -f $PIDFILE ]; then
    test -d /proc/$PIDFILE || rm $PIDFILE
fi

Should this be considered as a bug in the current service file ?

@muuki88 muuki88 added the debian label Jun 13, 2016
@muuki88
Copy link
Contributor

muuki88 commented Jun 13, 2016

Thanks for your report. This would mean that the start-stop-daemon helper doesn't work properly. The debian-start-script for SystemV only delegates to this helper.

@jpic
Copy link
Contributor Author

jpic commented Jun 13, 2016

Interresting, I'm a bit confused by the documentation:

   -p, --pidfile pid-file
          Check whether a process has created  the  file  pid-file.  Note:
          using   this   matching  option  alone  might  cause  unintended
          processes to be acted on, if the old process terminated  without
          being able to remove the pid-file.

That said, I'm now fairly convinced that start-stop-daemon implements SIGCHLD. However, I can't find it in the source code of dpkg for a debian wheezy container:

$ apt-get source dpkg    
$ grep -r CHLD dpkg-1.16.18/
dpkg-1.16.18/utils/start-stop-daemon.c: { "CHLD",   SIGCHLD },
dpkg-1.16.18/lib/compat/strsignal.c:    "SIGCHLD",  /* 17 */

I see it defined, but not implemented anywhere in this source tree. Perhaps asking upstream if this has been fixed if I'm not missing anything obvious would be advised here we find something tangible ?

@jpic
Copy link
Contributor Author

jpic commented Jun 13, 2016

This would mean that the start-stop-daemon helper doesn't work properly.

Well I would be interrested how testing this looks like in their CI, I would not blame them if that kind of issue is not tested automatically yet.

@muuki88
Copy link
Contributor

muuki88 commented Jun 14, 2016

Perhaps asking upstream if this has been fixed if I'm not missing anything obvious would be advised here we find something tangible

That would be great. Unfortunately the SystemV implementation is responsible for most of the bugs here :( Upstart/Systemd are a lot more kind.

I would not blame them if that kind of issue is not tested automatically yet.

Testing such things is hard. We aren't able to test all our stuff for native-packager as well. The test matrix of package format x build tool x build OS x target OS x archetype Plugins is not really manageable

@bersace
Copy link

bersace commented Jun 14, 2016

Jenkins uses an external, distro-independant tool to manage daemonizing: http://libslack.org/daemon/. That's a nice pattern to isolate all the pid/kill issue in a dedicated tool. This make it much easier to switch to systemd then. Just skip it :)

@bersace
Copy link

bersace commented Jun 14, 2016

You can then skip start-stop-daemon or tell it not to manage the daemon/pid file.

@jpic
Copy link
Contributor Author

jpic commented Jun 14, 2016

That would be great. Unfortunately the SystemV implementation is responsible for most of the bugs here :( Upstart/Systemd are a lot more kind.

Right, perhaps that's related to what @bersace explained to me on IRC, that they would be pretty "conservative" on this kind of issue.

Testing such things is hard. We aren't able to test all our stuff for native-packager as well. The test matrix of package format x build tool x build OS x target OS x archetype Plugins is not really manageable

I should probably not tell this in public, but this kind of test matrix just makes me 🚀 ➕ 💧 👖 The problem is that it's usually very hard to contribute to testing infrastructures when they're not using a modern service like travis or gitlab-ci which would easily allow such a matrix using prepared containers, it would be fast as well.

You can then skip start-stop-daemon or tell it not to manage the daemon/pid file.

But then if the java process is SIGKILLed then a dead pidfile will remain ?

Jenkins uses an external, distro-independant tool to manage daemonizing: http://libslack.org/daemon/.

Interresting initiative, listing our options here is definitely going to help, @muuki88 would you know if the play framework also offers a feature we're missing or not using here ?

@bersace
Copy link

bersace commented Jun 14, 2016

me 🚀 ➕ 💧 👖

Do you mean it make your pants wet ? 😮

@bersace
Copy link

bersace commented Jun 14, 2016

But then if the java process is SIGKILLed then a dead pidfile will remain ?

It's the job of daemon to manage the pid file, SIGCHLD, etc. daemon is a bit like a tiny single service init service.

@muuki88
Copy link
Contributor

muuki88 commented Jun 19, 2016

would you know if the play framework also offers a feature we're missing or not using here ?

I forgot to ask in the beginning. You are using play? Have you configured the play.pid parameter? Rule of thumb: don't let play handle its own pid.

@jpic
Copy link
Contributor Author

jpic commented Jun 28, 2016

If we don't let play do its own pid, we may end up with stale pidfiles preventing the service to start again (ie. power outage).

Perhaps checking if the pidfile is valid at the top of the service file is the only way ? We could check /proc/$PID and then /proc/$PID/cmdline ?

@jpic
Copy link
Contributor Author

jpic commented Jun 28, 2016

Confirmed the service script works as expected when we let start-stop-daemon handle the pidfile, because it overwrites stale pidfiles.

Thanks a heap !!!

@jpic jpic closed this as completed Jun 28, 2016
@muuki88
Copy link
Contributor

muuki88 commented Jun 28, 2016

no problem. You were neither the first nor will you be the last to fall in this trap. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants