Cancelling a job doesn't run post hooks or upload artifacts #119

keithpitt · 2015-03-27T09:26:25Z

When we cancel a job, we send a KILL signal to the bootstrap.sh process. This is troublesome because it doesn't allow us to upload artifacts, or run any post-script hooks. So, the idea I have is to send the kill signals to the underlying script process, not the bootstrap one.

To do this, first we need to find out the PID, I did some bash and came up with this:

#bootstrap.sh

./script.sh &
PID=$!
echo "(PID is: $PID) Waiting for it..."
wait $PID
echo "Run the hooks here"

#script.sh

function say-goodbye {
  echo "Goodbye!"
}

trap say-goodbye EXIT
SLEEP=15

echo "This is my current PID: $$"
echo "And now I'm sleeping for $SLEEP seconds"
sleep $SLEEP

Now the problem is communicating the PID back to the agent, but I think this can be done with buildkite-agent meta-data or something. Once we have that PID, the canceller just kills the process PID, not the boostrap one. And all the after script tasks + cleanup should all "just work"...I think!

The text was updated successfully, but these errors were encountered:

graemej · 2015-04-02T22:51:16Z

fyi: SIGTERM might be a better choice of signal so that it's catchable. In our usecase we're launching Docker containers as we have a few custom volume mounts and a SIGKILL doesn't reliably disconnect the docker run process from the process owned by the docker daemon when cancelling.

toolmantim · 2015-04-02T23:18:51Z

@graemej we do in fact send a SIGTERM (see

agent/buildkite/job.go

Line 79 in 1636d0e

j.process.Kill()

and

agent/buildkite/script.go

Line 188 in 76417b9

err := p.signal(syscall.SIGTERM)

). We only send a SIGKILL if it's still alive 10 seconds after sending a SIGTERM.

This issue is about having the agent kill the process that's invoked by the bootstrap.sh, rather than the bootstrap.sh process itself.

toolmantim · 2015-04-02T23:20:22Z

@graemej I've seen that with docker though! Especially if you don't use the [] syntax in the CMD of the Docker file, because the docker run command will be running using bash eval.

lox · 2017-05-31T05:36:34Z

This should be fixed now!

keithpitt · 2017-05-31T05:56:49Z

@lox oh right! Since the bootstrap swallows signals and just forwards them onto the child process, it means the bootstrap has a chance to do all the things?

lox · 2017-05-31T05:59:58Z

Urgh, y'know, I mis-read this. I was actually looking for specifically this bug report to mention that pre-exit needs to be added to the list of things that don't run after cancel.

lox · 2017-05-31T06:01:49Z

As it stands, cancel will send a SIGTERM to the bootstrap, which will send it to any sub-process-groups it has created and then exit. We need to add some handling to run specific hooks in the bootstrap post-killing of it's subprocesses but before it exits itself.

Which hooks should run? I'm kind of tempted to keep the current behaviour given it's been the status quo for so long. Perhaps we should add a pre-cancel and post-cancel hook?

lox · 2018-01-16T02:51:19Z

I think unfortunately this one is going to be bumped to 3.1.0.

ticky · 2018-03-21T22:56:49Z

Just going to put this here because it might be related: do we have any plans to make the force-kill timeout user-configurable? It seems that ten seconds is not enough time for some applications to clean up after themselves.

amitsaha · 2018-07-13T05:15:21Z

As it stands, cancel will send a SIGTERM to the bootstrap, which will send it to any sub-process-groups it has created and then exit. We need to add some handling to run specific hooks in the bootstrap post-killing of it's subprocesses but before it exits itself.

@lox it will be useful to consider windows OS when discussing this, since on Windows:

No concept of signals
AFAIK, no idea of process groups

Hence, I think issue #794 may be worth considering.

Which hooks should run? I'm kind of tempted to keep the current behaviour given it's been the status quo for so long. Perhaps we should add a pre-cancel and post-cancel hook?

These separate hooks would be very useful to have. This in addition to #794 would improve the experience on Windows greatly.

lox · 2018-07-15T00:53:50Z

It turns out that we do run pre-exit after cancellation! We added a test recently in #789. It looks like it's not acting as hoped under windows though.

lox · 2019-01-09T22:57:04Z

It's now acting correctly under windows in #879.

vaughandroid mentioned this issue Jun 8, 2016

Cancelling an Android job leads to the agent getting 'stuck' #336

Closed

lox closed this as completed May 31, 2017

lox reopened this May 31, 2017

lox added the enhancement label Nov 4, 2017

lox mentioned this issue Nov 4, 2017

pre-exit hook not running for cancelled builds #562

Closed

lox added this to the v3.0.0 milestone Nov 4, 2017

lox modified the milestones: v3.0.0, v3.1.0 Jan 16, 2018

keithpitt modified the milestones: v3.1.0, v4.0.0 Mar 13, 2018

lox mentioned this issue May 3, 2018

Feature: pre-cancel hook buildkite/feedback#361

Closed

lox closed this as completed Jan 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cancelling a job doesn't run post hooks or upload artifacts #119

Cancelling a job doesn't run post hooks or upload artifacts #119

keithpitt commented Mar 27, 2015

graemej commented Apr 2, 2015

toolmantim commented Apr 2, 2015

toolmantim commented Apr 2, 2015

lox commented May 31, 2017

keithpitt commented May 31, 2017 •

edited

Loading

lox commented May 31, 2017

lox commented May 31, 2017

lox commented Jan 16, 2018

ticky commented Mar 21, 2018

amitsaha commented Jul 13, 2018

lox commented Jul 15, 2018

lox commented Jan 9, 2019

Cancelling a job doesn't run post hooks or upload artifacts #119

Cancelling a job doesn't run post hooks or upload artifacts #119

Comments

keithpitt commented Mar 27, 2015

graemej commented Apr 2, 2015

toolmantim commented Apr 2, 2015

toolmantim commented Apr 2, 2015

lox commented May 31, 2017

keithpitt commented May 31, 2017 • edited Loading

lox commented May 31, 2017

lox commented May 31, 2017

lox commented Jan 16, 2018

ticky commented Mar 21, 2018

amitsaha commented Jul 13, 2018

lox commented Jul 15, 2018

lox commented Jan 9, 2019

keithpitt commented May 31, 2017 •

edited

Loading