Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user to kill a build #2471

Closed
benjaoming opened this issue Oct 26, 2016 · 6 comments
Closed

Allow user to kill a build #2471

benjaoming opened this issue Oct 26, 2016 · 6 comments
Labels
Improvement Minor improvement to code Needed: design decision A core team decision is required Needed: more information A reply from issue author is required

Comments

@benjaoming
Copy link
Contributor

benjaoming commented Oct 26, 2016

Details

There are many builds that I see get stuck. And for various reasons that should also be fixed. I also see a lot of reports in here.

Examples... I didn't browse all issues through, but you'll get the picture...
#2468
#2425
#2415
#2407
#2401
#2295
#2226

While I don't have any insights about how RTD works or why these builds are stuck, I would suggest an option for the user to kill/cancel a build and have directories cleaned up.

As another example, thake this list of builds:

screenshot from 2016-10-26 11-56-00

I can't tell if having the ability to purge a stuck job, removing it from the queues would have any positive effects on other reports of build problems. All I'm saying is that it's worth a try, and ultimately, it can free up resources because users with very large projects can voluntarily cancel jobs that they don't want running (I often do that on Travis and Circle).

Flow

  • Build is queued in an unfinished state
  • User clicks Cancel build
    • State of build changes to cancelling
  • Worker cancellation happens:
    • Task cancellation through celery begins
    • Task intercepts kill signal and gracefully shuts down docker build container and processes
    • Task goes through end of build process with build in error state
    • Task state is updated, update is in cancelled state
    • Task is finally allowed to expire
  • User receives feedback messages containing one or more of the following:
    • Build process killed
    • Build removed from queue
    • User directories purged
  • Build changes state to cancelled

Updated flow with @agjohnson's suggestions

@agjohnson
Copy link
Contributor

Currently, processes can get stuck in a invalid building state. These projects are not building anymore, their state was just never updated after a hard failure.

Offering a cancel build task option might be a useful feature though. It is confusing for builds to remain around. I'd probably work on ensuring build status is valid and always updated first however. A feature like this would require us to rework the project build task a bit. Flow is more like:

  • Build is queued, task id is stored somewhere and UI is surfaced to user to kill the task
  • User clicks Cancel build
  • Task cancellation through celery begins
  • Task intercepts kill signal and gracefully shuts down docker build container and processes
  • Task goes through end of build process with build in error state
  • Task state is updated, update is in canceled state
  • Task is finally allowed to expire

Not exactly sure of what happens here with the celery interaction. We would require a "soft" kill of the task, because we need to go through cleanup and reporting efforts. This probably goes hand in hand with ensuring a task that times out will clean up and report as well -- it does always, hence the stuck build states.

@agjohnson agjohnson added Improvement Minor improvement to code Needed: design decision A core team decision is required labels Oct 28, 2016
@benjaoming
Copy link
Contributor Author

Should there be a timeout regarding the graceful docker shutdown? I'm thinking that the intention of having a user-driven possibility to cancel builds is actually to have a guarantee that you can get stuck builds unstuck and removed.

@agjohnson
Copy link
Contributor

Yup, this timeout is already in place, same with the celery task. The most
common scenario where a build gets "stuck" is there is a seemingly
non-deterministic issue where even though the celery task has a longer timeout,
the celery task is killed before the docker container, resulting in a build
state not updating.

Aside from build process not finishing, there is no additional side effect of a
build being killed early. It doesn't consume resources, and there was no skipped
build cleanup steps. This is why I lean towards first addressing the issues with
build state update. I think fixing the bugs with build state would lead to less
users worried about backing up our build queue :)

I do think a kill build feature is great idea still. This feature is probably a
little further out on our roadmap, but good to consider now.

On Sun, Oct 30, 2016 at 07:07:05AM -0700, Benjamin Bach wrote:

Should there be a timeout regarding the graceful docker shutdown? I'm thinking that the intention of having a user-driven possibility to cancel builds is actually to have a guarantee that you can get stuck builds unstuck and removed.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#2471 (comment)

[email protected]
@agjhnsn

@benjaoming
Copy link
Contributor Author

Did anyone on purpose add spaces after the '#' in the issue references, making them headlines or was it like that from the beginning? :) Don't wanna edit back the description in case the references are troublesome... but I think they were all good examples...

@humitos
Copy link
Member

humitos commented Mar 21, 2018

I think this issue can be closed in favor of #3312

If you check all the issues linked in the description, most/all of the builds are in FAILED state with the message: "This build was terminated due to inactivity".

Is the implemention of this task missing something regarding this issue?

@humitos humitos added the Needed: more information A reply from issue author is required label Mar 28, 2018
@humitos
Copy link
Member

humitos commented Mar 28, 2018

I'm closing this issue here since. Feel free to reopen if you consider.

@humitos humitos closed this as completed Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvement Minor improvement to code Needed: design decision A core team decision is required Needed: more information A reply from issue author is required
Projects
None yet
Development

No branches or pull requests

3 participants